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Big Data and Blockchain Technology for Secure IoT Applications presents 
a comprehensive exploration of the intersection between two transforma- 
tive technologies: big data and blockchain, and their integration into secur- 
ing Internet of Things (IoT) applications. As the IoT landscape continues 
to expand rapidly, the need for robust security measures becomes para- 
mount to safeguard sensitive data and ensure the integrity of connected 
devices. This book delves into the synergistic potential of leveraging big 
data analytics and blockchain’s decentralized ledger system to fortify IoT 
ecosystems against various cyber threats, ranging from data breaches to 
unauthorized access. 

Within this groundbreaking text, readers will uncover the foundational 
principles underpinning big data analytics and blockchain technology, along 
with their respective roles in enhancing IoT security. Through insightful 
case studies and practical examples, this book illustrates how organizations 
across diverse industries can harness the power of these technologies to mit- 
igate risks and bolster trust in IoT deployments. From real-time monitoring 
and anomaly detection to immutable data storage and tamper-proof trans- 
actions, the integration of big data and blockchain offers a robust frame- 
work for establishing secure, transparent, and scalable IoT infrastructures. 

Furthermore, this book serves as a valuable resource for researchers, 
practitioners, and policymakers seeking to navigate the complexities of IoT 
security. By bridging the gap between theory and application, this book 
equips readers with the knowledge and tools necessary to navigate the 
evolving landscape of interconnected devices while safeguarding against 
emerging cyber threats. With contributions from leading experts in the 
field, it offers a forward-thinking perspective on harnessing the transfor- 
mative potential of big data and blockchain to realize the full promise of 
the IoT securely. 
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Chapter | 


Scheduling Internet of 

Things tasks in cloud and fog 
computing environment using 
cuckoo search optimization 


M. Santhosh Kumar and Ganesh Reddy Karri 


I.1 INTRODUCTION 


The proliferation of Internet of Things (IoT) devices has led to an exponen- 
tial increase in the volume and diversity of data generated, necessitating 
efficient management strategies to ensure optimal utilization of computing 
resources [1]. Cloud and fog computing (CFC) paradigms have emerged 
as prominent solutions to address the challenges posed by the vast scale 
and heterogeneity of IoT deployments. In these environments, effective 
task scheduling (TS) plays a critical role in optimizing resource allocation, 
reducing latency, and meeting quality of service (QoS) requirements. 

TS in IoT scenarios involves the allocation of computational tasks gener- 
ated by IoT devices to appropriate computing resources, which may include 
cloud servers, edge devices, or a combination thereof. However, the dynamic 
nature of IoT environments, characterized by varying workloads, resource 
constraints, and network conditions, presents significant challenges for tra- 
ditional scheduling approaches [2]. In recent years, metaheuristic optimi- 
zation algorithms have garnered considerable attention for their ability to 
tackle complex optimization problems effectively. Among these, the cuckoo 
search optimization algorithm (CSOA) has shown promise in addressing 
various optimization tasks, including TS in cloud and IoT environments [3]. 

The CSOA, initially proposed in Ref. [4], is a metaheuristic approach 
inspired by the brood parasitism behavior observed in certain cuckoo 
species. Renowned for its prowess in tackling a diverse array of intricate 
optimization challenges, cuckoo search optimization (CSO) operates on 
principles reminiscent of the reproductive strategies employed by cuckoo 
birds. Just as these birds deposit their eggs in the nests of other species to 
enhance offspring survival, CSO utilizes candidate solutions (or “nests”) 
to represent potential solutions to optimization problems, evaluating their 
quality through a fitness function. Throughout the optimization process, 
CSO integrates randomization and local search mechanisms to traverse the 
solution space efficiently, ensuring a delicate balance between exploration 
and exploitation. 
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An aspect that sets CSO apart is its straightforward implementation and 
user-friendly nature, rendering it accessible to professionals and research- 
ers from diverse fields. Unlike conventional optimization methods, which 
frequently hinge on intricate mathematical models, CSO embraces a sim- 
plistic conceptual framework inspired by natural processes. Furthermore, 
CSO showcases adaptability in tackling optimization quandaries of varying 
natures, be it continuous, discrete, or combinatorial tasks. Its adeptness in 
navigating the fine line between exploration and exploitation allows CSO 
to effectively converge toward near-optimal solutions, establishing it as a 
versatile and invaluable asset for resolving optimization hurdles across a 
broad spectrum of applications. 

This study’s primary aim is to optimize the scheduling of IoT tasks within 
CFC environments by leveraging the CSOA. In IoT systems, efficient TS 
is essential to ensure optimal resource utilization, minimize latency, and 
meet QoS requirements [5,6]. However, the dynamic nature of IoT work- 
loads, coupled with the diverse computing resources available in cloud and 
fog environments, presents significant challenges for traditional scheduling 
approaches. By integrating the CSOA, known for its ability to effectively 
explore solution spaces and find optimal solutions in complex optimization 
problems, we aim to develop a scheduling framework that can adapt to the 
dynamic nature of IoT environments while improving resource allocation 
and system performance [7]. 

To achieve this aim, our objectives are multifaceted. First, we will con- 
duct a comprehensive analysis of the unique challenges associated with 
IoT TS in CFC environments, taking into account factors such as resource 
constraints, varying workloads, and real-time processing requirements [8]. 
Second, we will delve into the principles and mechanisms of the CSOA, 
exploring how its inherent search algorithms can be harnessed to address 
the complexities of IoT TS. Through this understanding, we will develop 
a CSOA-based scheduling framework tailored specifically to the require- 
ments of IoT deployments, aiming to optimize task allocation and improve 
overall system efficiency. 

Subsequently, we will validate the efficacy of the proposed CSOA-based 
scheduling approach through extensive simulations and comparative anal- 
yses against conventional scheduling algorithms commonly used in CFC 
environments. Performance evaluations will focus on metrics such as task 
completion time, resource utilization, and scalability, providing insights 
into the advantages and limitations of the CSOA-based approach. Finally, 
we will validate the practical applicability of the proposed framework 
through real-world experiments in representative IoT scenarios, demon- 
strating its potential for enhancing system performance and meeting the 
evolving demands of IoT applications in CFC environments. The contribu- 
tion of this study is as follows: 
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e This study presents a new approach to TS in IoT uses CFC environ- 
ments using the CSOA. Leveraging CSOA’s unique search mechanisms, 
the method addresses resource allocation, latency minimization, and 
QoS challenges in dynamic IoT ecosystems, expanding scheduling 
techniques for IoT deployments and enhancing system efficiency. 

e Through extensive simulations, the CSOA-based scheduling approach 
improves resource utilization and system performance. Tasks are 
dynamically allocated to appropriate computing resources, optimiz- 
ing completion time, reducing latency, and maximizing efficiency. 
This benefits IoT applications by ensuring reliable task execution and 
contributes to CFC infrastructure optimization. 

e The research findings offer practical insights for designing and man- 
aging IoT systems in CFC environments. By showcasing the effective- 
ness and scalability of the CSOA-based approach, the study guides 
IoT practitioners and system designers in optimizing TS strategies. 
Validated performance in real-world IoT scenarios underscores the 
methodology’s potential to meet user demands and service-level 
objectives effectively, facilitating seamless IoT application operation. 


This chapter focuses on exploring the efficacy of utilizing the CSOA for 
TS in CFC architectures. By leveraging the unique search mechanisms tak- 
ing cues from the brood parasitism observed in cuckoo species, the CSOA 
aims to efficiently explore the solution space and discover optimal task 
allocation strategies. Through comprehensive evaluations and compara- 
tive analyses, we aim to demonstrate the advantages of the CSOA-based 
approach over traditional scheduling algorithms regarding their impact on 
task completion time, resource utilization, and the overall system perfor- 
mance. Furthermore, we explore how these findings can be applied to real- 
world IoT applications and identify potential directions for further research 


in this field. 


1.2 LITERATURE SURVEY 


The literature survey on scheduling IoT tasks in CFC environments using the 
CSOA reveals a growing interest in addressing the challenges posed by the 
dynamic nature of IoT workloads and the diverse computing resources avail- 
able in cloud and fog architectures. Previous studies have explored various 
scheduling approaches, including traditional heuristic algorithms and meta- 
heuristic optimization techniques, to optimize resource utilization, minimize 
latency, and enhance QoS metrics in IoT deployments. However, while these 
methods have shown promise in specific scenarios, they often struggle to adapt 
to the dynamic and heterogeneous nature of IoT environments effectively. 
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Emerging research suggests that metaheuristic optimization algorithms, 
such as CSOA, offer a promising avenue for addressing the complexities 
of IoT TS. Studies have demonstrated the effectiveness of CSOA in solving 
various optimization problems by mimicking the brood parasitism behavior 
of cuckoo species, enabling efficient exploration of solution spaces and dis- 
covery of optimal task allocation strategies. By applying CSOA to IoT TS in 
CFC environments, researchers aim to develop a novel methodology capable 
of dynamically allocating tasks to appropriate computing resources based 
on real-time conditions and constraints. This literature survey highlights 
the need for further investigation into CSOA-based scheduling approaches 
and their potential to improve resource utilization, system performance, 
and scalability in IoT deployments. 

Table 1.1 presents a comprehensive literature survey of existing works 
focused on TS within CFC environments. Each entry in the table highlights 
a specific scheduling technique, detailing its parameters, contributions, and 
limitations. Various machine learning, bio-inspired algorithms, and opti- 
mization approaches are examined in terms of their applicability to TS in 
CFC environments. Each technique’s contribution, such as effectiveness in 
high-dimensional spaces, state-of-the-art performance in image classifica- 
tion, or suitability for sequential data modeling, is carefully noted alongside 
its limitations, such as susceptibility to overfitting, computational intensity, 
or sensitivity to parameter choice. By presenting this literature survey in a 
structured table format, the overview facilitates a comparative analysis of 
different techniques, aiding researchers in identifying suitable methodolo- 
gies for their specific TS challenges in CFC environments. 

This literature survey underscores the diverse range of techniques 
employed in TS optimization in CFC, shedding light on their respective 
strengths and weaknesses. As CFC environments become increasingly 
prevalent in modern computing ecosystems, the need for efficient TS mech- 
anisms grows more pressing. By examining the contributions and limita- 
tions of various scheduling techniques, researchers can better understand 
the landscape of existing approaches and formulate informed strategies 
for optimizing resource allocation, minimizing task completion time, and 
enhancing overall system performance in CFC environments. The insights 
gleaned from this literature survey serve as a valuable foundation for future 
research endeavors aimed at advancing the cutting edge in cloud-fog TS and 
addressing the evolving challenges of modern computing paradigms. 


1.3 RESEARCH METHODOLOGY 


1.3.1 System model 


The system model for scheduling IoT tasks in CFC environments using the 
CSOA involves several key components and considerations. First, the IoT 
ecosystem comprises a diverse array of interconnected devices generating 


Table I.1 Detailed analysis of existing works in cloud-fog environment 


Ref. no. Technique name Parameters used Contributions Limitations 
[9] Dynamic self- Makespan, Effectively streamlines task scheduling (TS) May converge to local optima due to the 
organizing map response time, within cloud-fog environments by stochastic nature of the algorithm; 
(DSOM) Dol emulating the brood parasitism behavior parameter tuning can be non-trivial, 
observed in cuckoo birds requiring careful adjustment for different 
problem instances 
[10] Ant colony Makespan, average Exhibits exceptional performance by Vulnerable to premature convergence, 
optimization for waiting time significantly decreasing task completion especially when dealing with complex 
time series time and enhancing resource utilization scheduling scenarios with dynamic 
(ACOTS) when contrasted with conventional workload variations 
scheduling methodologies 
[11] Particle swarm Cost of Successfully strikes a harmonious balance May encounter challenges related to 
optimization-based execution, time between energy consumption and task scalability when addressing extensive TS 
bullet (PSO-based duration, and completion time, catering to the unique scenarios characterized by a multitude of 
BULLET) energy demands of cloud-fog environments tasks and resources; demands meticulous 
expenditure, in calibration of population size to avert 
addition to premature convergence and ensure 
other quality of optimal performance 
service (QoS) 
[12] Modified ant colony Makespan and Significantly improves the overall system Lack of robustness in handling noisy and 
optimization degree of performance by optimizing task allocation dynamic environments, where task 
(MACO) imbalance and resource provisioning in fog arrival rates and resource availability may 
computing environments fluctuate rapidly 
[13] Modified grey wolf Makespan, energy Achieves near-optimal TS solutions by Limited by the absence of a mechanism to 


optimizer (MGWO) 


consumption 


effectively exploring the solution space 
and exploiting promising regions 


adaptively adjust algorithm parameters 
based on problem characteristics and 
environmental changes 


(Continued) 
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Table 1.1 (Continued) Detailed analysis of existing works in cloud-fog environment 


Ref. no. Technique name Parameters used Contributions Limitations 
[14] Whale optimization Total Demonstrates resilience to local optima Computational overhead associated with 
algorithm (WOA) communication and effectively explores diverse solutions, parameter tuning and sensitivity analysis 
cost leading to improved TS performance in may hinder its practical applicability in 
dynamic cloud-fog environments real-time TS scenarios 
[15] Balanced learning Execution time Presents a compelling method for tackling Lack of theoretical guarantees on 
algorithm (BLA) and memory TS hurdles in fog computing by effectively convergence properties and solution 
consumption assigning tasks to suitable fog nodes, optimality may raise concerns about its 
thereby optimizing resource utilization reliability in critical applications 
[16] Enhanced learning Robot workload Demonstrates scalability and effectiveness Limited by the absence of mechanisms to 
based strategy ration and mean in handling large-scale TS problems with incorporate dynamic changes in task 
(ELBS) difference multiple objectives and constraints in priorities and resource availability during 
cloud-fog environments runtime scheduling decisions 
[17] Multi-objective Energy efficiency Supplies an adaptable framework for Lack of standardization in parameter 
evolutionary optimizing TS within fog computing settings and evaluation metrics may 
technique for environments, capable of accommodating hinder the reproducibility and 
efficient task diverse optimization goals and constraints comparability of research findings across 
scheduling (MEETS) different studies 
[18] Generalized Energy Demonstrates adaptability and robustness Limited by the absence of mechanisms to 


kinematic synthesis 
(GKS) 


consumption, 
execution cost, 
and sensor 
lifetime 


in addressing uncertainties and 
fluctuations in cloud-fog environments, 
resulting in improved TS performance 


handle heterogeneous resource 
characteristics and varying task 
requirements effectively 


(Continued) 
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Table |.1 (Continued) Detailed analysis of existing works in cloud-fog environment 


Ref. no. Technique name Parameters used Contributions Limitations 
[19] Optimized fuzzy Satisfaction of Presents an encouraging solution for Lack of interpretability in the decision- 
clustering users and the enhancing the TS in fog computing making process may hinder the 
efficiency of scenario, harnessing the exploratory understanding of algorithm behaviors 
resource prowess of the cuckoo search algorithm and solution quality assessment 
scheduling to discover solutions that are close to 
optimal 
[20] Particle swarm Makespan and Demonstrates efficiency and effectiveness Vulnerable to stagnation when the 
optimization (PSO) energy in improving resource utilization and task algorithm converges prematurely to 
consumption completion time in fog computing suboptimal solutions, especially in 
environments with dynamic workload scenarios with complex task 
patterns dependencies and constraints 
[21] Multi-objective firefly Execution time, Tackles the complexities of TS in fog Lack of comprehensive benchmarking 
optimization-based transfer time, computing by furnishing resilient and datasets and standard evaluation 
cyber-physical and makespan scalable optimization solutions that protocols may hinder the fair 
system (MFO-based dynamically adjust to evolving comparison and benchmarking of 
CPS) environmental factors different CSO-based scheduling 
algorithms 
[22] Preprocessing phase Communication Represents a promising avenue for Limited by the absence of mechanisms to 
(PP) bandwidth and addressing TS optimization challenges in handle uncertainties and dynamic 
transmission fog computing, showcasing performance changes in task characteristics and 
latency on par with other metaheuristic system conditions effectively 
algorithms 
[23] Comprehensive Unit cost of Delivers a versatile and adaptive framework Lack of theoretical analysis and empirical 
monitoring and memory and for optimizing TS in fog computing, adept validation in real-world fog computing 
analysis system storage, at accommodating a wide range of environments may raise concerns about 


(CMaS) 


communication 
cost per data 


optimization objectives and constraints 


algorithm reliability and applicability 
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data and tasks, which are transmitted to cloud servers and edge devices 
for processing. Cloud servers typically offer high computational power and 
storage capacity but may introduce latency due to longer communication 
distances. On the other hand, edge devices located closer to IoT sensors can 
provide low-latency processing but may have limited resources (Figure 1.1). 

In this system model, the objective is to efficiently allocate IoT tasks 
to cloud servers and edge devices based on real-time conditions and con- 
straints. Tasks may vary in computational requirements, deadline sensitiv- 
ity, and data dependencies, necessitating intelligent scheduling strategies 
to optimize resource utilization and meet QoS requirements. The CSOA, 
taking cues from the brood parasitism observed in cuckoo species, offers a 
promising approach to dynamically allocate tasks while balancing compu- 
tational loads across the CFC infrastructure. 

To implement the CSOA-based scheduling system, a set of parameters 
and constraints must be defined, including task characteristics (e.g., com- 
putational requirements, deadlines), resource availability (e.g., processing 
capacity, network bandwidth), and communication latency. The CSOA 
algorithm iteratively explores the solution space, with each iteration rep- 
resenting a potential task allocation scenario. During the search process, 
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í 
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Random Workflow 
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End devices 


Figure 1.1 System architecture. 
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cuckoos (representing candidate solutions) deposit eggs (representing poten- 
tial task allocations) in nests (representing computing resources) based on 
fitness evaluations. Through this iterative optimization process, the CSOA 
aims to converge on an optimal task allocation strategy aimed at minimiz- 
ing task completion time, maximizing resource utilization, and satisfying 
QoS constraints in IoT deployments across CFC environments. 


Thodes = Crodes + Fhodes (1 . 1) 


In the above equation, T odes means the sum of both the cloud nodes and fog 
nodes. Codes and F,,,4., denote the cloud nodes and fog nodes, respectively. 
The system model for TS in IoT tasks for CFC environments involves a 
hierarchical architecture designed to efficiently process and manage IoT- 
generated data and tasks. At the top level, cloud computing infrastructure 
provides vast computational resources and storage capabilities. However, 
due to the potential latency incurred by transmitting data to and from dis- 
tant cloud servers, intermediate fog computing nodes are introduced. Fog 
nodes, located closer to the IoT devices, offer low-latency processing and 
can offload computational tasks from the cloud. Additionally, edge devices, 
situated at the network periphery, further reduce latency by processing data 
locally, making them suitable for time-sensitive IoT applications. 

In this system model, efficiently scheduling IoT tasks across cloud, fog, and 
edge computing layers is paramount for optimizing resource utilization and 
adhering to the QoS standards. Leveraging the CSOA, tasks are dynamically 
allocated to the most appropriate computing resources, taking the param- 
eters like task attributes, resource availability, and network conditions. By 
leveraging CSOA’s ability to efficiently explore solution spaces and find opti- 
mal task allocation strategies, the scheduling system aims to minimize task 
completion time, reduce latency, and enhance overall system performance. 

To implement the CSOA-based TS system in the CFC and edge the sys- 
tem model defines a set of parameters and constraints. These include task 
attributes (e.g., computational requirements, deadlines), resource charac- 
teristics (e.g., processing capacity, memory), and communication latency 
between different layers of the computing infrastructure. The CSOA algo- 
rithm iteratively evaluates potential task allocations across cloud, fog, and 
edge nodes, aiming to converge on an allocation strategy that maximizes 
resource utilization, minimizes latency, and satisfies QoS constraints. By 
efficiently distributing IoT tasks across the cloud, fog, and edge layers, the 
scheduling system facilitates timely and reliable processing of IoT-generated 
data, enabling scalable and responsive IoT applications. 


nodes 


1.3.1.1 Random workflow 


Random workflow usage in scheduling IoT tasks within CFC environments 
presents both challenges and opportunities. In scenarios where the arrival 
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Figure 1.2 Task random workflow. 


of IoT tasks follows a random pattern, traditional deterministic schedul- 
ing algorithms may struggle to effectively allocate resources and meet QoS 
requirements. Random workflow usage introduces variability and unpre- 
dictability in task arrival rates, computational demands, and network con- 
ditions, necessitating adaptive scheduling strategies capable of dynamically 
adjusting to changing workload patterns. However, this variability also 
offers opportunities for optimization, as it allows scheduling algorithms 
to explore a broader range of potential task allocation scenarios and adapt 
their decisions in real time to minimize resource utilization and system per- 
formance (Figure 1.2). 

The CSOA offers a promising approach to addressing the challenges 
posed by random workflow usage in IoT TS. By leveraging CSOA’s inherent 
ability to efficiently explore solution spaces and discover optimal task allo- 
cation strategies, the scheduling system can adapt to the dynamic nature 
of workload patterns and network conditions. The CSOA operates based 
on principles taking cues from the brood parasitism observed in cuckoo 
species, allowing it to dynamically adjust task allocations in response to 
changes in task characteristics, resource availability, and communication 
latency. Through iterative optimization, CSOA aims to converge on task 
allocation strategies that minimize task completion time, reduce latency, 
and maximize resource utilization in CFC environments. 

Incorporating CSOA into the scheduling system for IoT tasks in CFC envi- 
ronments enables the exploitation of random workflow usage to improve sys- 
tem efficiency and performance. By dynamically allocating tasks based on 
real-time conditions and constraints, the CSOA-based scheduling approach 
optimizes resource utilization, reduces response times, and enhances the 
scalability of IoT applications. Additionally, CSOA’s adaptability to chang- 
ing workload patterns and network conditions makes it well-suited for 
addressing the challenges posed by random workflow usage, offering a 
robust solution for optimizing IoT TS in dynamic CFC environments. 
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1.3.2 Problem formulation 


The problem formulation for scheduling IoT tasks in CFC environments 
using the CSOA begins with defining the key objectives and constraints 
of the scheduling system. The main objective is to allocate incoming IoT 
tasks to appropriate computing resources (cloud servers, fog nodes, or edge 
devices) in a manner that minimizes task completion time, reduces latency, 
and maximizes resource utilization. Additionally, the system must consider 
various constraints, including the computational capabilities of each com- 
puting resource, communication latency between devices, and QoS require- 
ments such as task deadlines and reliability. 

In essence, the problem can be formalized as an optimization challenge 
aimed at identifying an ideal task allocation strategy that minimizes a pre- 
defined objective function while adhering to all constraints. Within this 
framework, decision variables signify the assignment of each IoT task to 
a particular computing resource, while the objective function gauges the 
overall system performance by considering factors such as task comple- 
tion time and resource utilization. The incorporation of constraints serves 
to enforce adherence to resource availability, communication latency, and 
QoS criteria, thereby guaranteeing the feasibility and dependability of the 
scheduling solution. 

The complexity of the problem arises from the dynamic nature of IoT 
workloads, which exhibit variability in task arrival rates, computational 
demands, and network conditions. Moreover, the heterogeneous nature of 
computing resources in cloud and fog environments adds another layer of 
complexity, requiring adaptive scheduling strategies capable of dynamically 
adjusting task allocations based on real-time conditions. By formulating the 
problem as an optimization task and leveraging the CSOA, the scheduling 
system aims to address these challenges by efficiently exploring solution 
spaces and discovering optimal task allocation strategies that enhance sys- 
tem efficiency and performance in IoT deployments. 


1.3.2.1 Objective function 


Objective function serves as a measure of the system’s performance, guid- 
ing the optimization process to find the most efficient task allocation strat- 
egy. The objective function can be formulated to balance various factors 
such as task completion time, resource utilization, and QoS requirements. 
Mathematically, the objective function f(x) can be expressed as a combina- 
tion of these factors: 


f (x)= w,-T,(x)+ w+ U(x) + w :QoS(x) (1.2) 


In this context, let x denote the task allocation solution, where T.(x) rep- 
resents the task completion time, U(x) denotes resource utilization, and 
QoS(x) signifies a measure of QoS satisfaction. The weights w4, w,, and w3 
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are utilized to gauge the relative importance of each factor within the objec- 
tive function. The optimization of the objective function involves identify- 
ing the task allocation solution x» that either minimizes or maximizes the 
overall system performance based on the specified criteria. 

The CSOA is employed to optimize the objective function by iteratively 
exploring solution spaces and adjusting task allocations to improve system 
performance. CSOA dynamically adjusts task allocations based on real- 
time conditions and constraints, aiming to converge on an optimal solution 
that minimizes task completion time, maximizes resource utilization, and 
satisfies QoS requirements. Through efficient optimization facilitated by the 
CSOA, IoT applications can achieve improved system efficiency, reduced 
response times, and enhanced user experience in CFC environments. 


1.3.2.2 Task completion time 


Task completion time is a critical metric in scheduling IoT tasks in CFC 
environments, as it directly impacts the responsiveness and efficiency of 
IoT applications. Mathematically, the task completion time T, can be repre- 
sented as the total processing time P; for each IoT task i, considering the time 
taken for computation, communication, and any potential queuing delays: 


T: -Y?r (1.3) 
i=1 


In this scenario, let us denote n as the total number of IoT tasks requiring 
scheduling. The processing time P; for each task is contingent on multiple 
factors, such as the computational demands of the task, the processing capa- 
bilities of the allocated computing resource, and the communication latency 
between devices. In CFC environments, where tasks can span across vari- 
ous computing layers, the task completion time is impacted by the efficacy 
of task allocation strategies and the optimization of resource utilization. 

The CSOA plays a crucial role in minimizing the task completion time by 
dynamically allocating tasks to suitable computing resources based on real- 
time conditions and constraints. By iteratively exploring solution spaces 
and optimizing task allocations, CSOA aims to converge on allocation 
strategies that minimize the overall task completion time while consider- 
ing factors such as resource availability, communication latency, and QoS 
requirements. Through efficient TS facilitated by CSOA, IoT applications 
can achieve reduced response times, improved system performance, and 
enhanced user experience. 


1.3.2.3 Resource utilization 


Resource utilization is a critical aspect of scheduling IoT tasks in CFC 
environments, as it directly impacts the efficiency and cost-effectiveness of 
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resource allocation. Mathematically, resource utilization U can be defined 
as the ratio of the total time that computing resources are actively process- 
ing tasks to the total available time: 


U = Tactive (1.4) 
Troti 
Here, Tetive represents the total time that computing resources are actively 


processing tasks, while T,,,,; represents the total available time. Efficient TS 
strategies aim to maximize resource utilization by minimizing idle time and 
ensuring that computing resources are effectively utilized to process IoT 
tasks. This involves dynamically allocating tasks to computing resources 
based on real-time conditions and constraints, such as computational capa- 
bilities, communication latency, and QoS requirements. 

The CSOA plays a crucial role in optimizing resource utilization by 
iteratively exploring solution spaces and optimizing task allocations. By 
dynamically allocating tasks to appropriate computing resources based 
on workload patterns and resource availability, CSOA aims to maximize 
resource utilization while minimizing idle time. Through efficient resource 
allocation facilitated by CSOA, IoT applications can achieve optimal uti- 
lization of CFC resources, leading to improved system performance and 
cost-effectiveness. 


1.3.2.4 Proposed algorithm 
The pseudo code for the proposed CSO algorithm is given below: 


Input: task characteristics, task priorities, and task requirements func- 

tion CSOA_Task_Scheduling(IoT_tasks, cloud_servers, fog_nodes): 
Initialize population of cuckoos randomly Evaluate fitness of each 

cuckoo in the population while termination condition is not met do: 


Choose cuckoos for egg laying 

Generate new solutions by performing levy flights 
Evaluate fitness of new solutions 

Replace nests with worse solutions 

Abandon nests with probability pa 


end while 
Return best solution found function Initialize_population (): 


Initialize a population of cuckoos randomly return population 
function Evaluate_fitness (solution): 


Calculate task completion time for the given solution 
Calculate resource utilization for the given solution 
Calculate QoS satisfaction for the given solution 
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Calculate fitness as a combination of the above factors 
return fitness 


function Choose_cuckoos_for_egg_laying (population): 


Choose cuckoos for egg laying based on fitness 
return selected_cuckoos 


function Generate_new_solutions (selected_cuckoos): 


Perform levy flights to generate new solutions 
return new_solutions 


function Replace_nests (population, new_solutions): 


Replace nests with worse solutions from new solutions 
return updated_population 


function Abandon_nests (population): 


Abandon nests with a probability pa 
return updated_population 


Output: Best solution obtained 


In this pseudo code, the CSOA_Task_Scheduling function initializes a 
population of cuckoos, evaluates the fitness of each cuckoo, and iteratively 
performs the CSOA until a termination condition is met. The Initialize_ 
population function initializes the population of cuckoos randomly, while 
the Evaluate_fitness function calculates the fitness of each solution based 
on task completion time, resource utilization, and QoS satisfaction. The 
Choose_cuckoos_for_egg_laying function selects cuckoos for egg laying 
based on their fitness, and the Generate_new_solutions function gener- 
ates new solutions by performing levy flights. The Replace_nests function 
replaces nests with worse solutions from the new solutions, and the 
Abandon_nests function abandons nests with a certain probability. 


1.4 RESULTS AND DISCUSSION 


1.4.1 Results 


Simulation serves as a pivotal platform for assessing the efficacy of TS algo- 
rithms within CFC environments. In this study, we present the outcomes of 
simulations aimed at scrutinizing the performance of TS using CSO. The 
simulation setup was meticulously configured to mimic real-world cloud- 
fog environments, encompassing various parameters crucial for accurate 
representation. The methodologies utilized in this study underwent rig- 
orous testing and implementation utilizing the CloudSim 3.0.3 simula- 
tion framework, integrated with Java programming, to emulate cloud-fog 
computing environments. The experimentation took place on a personal 
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Table 1.2 Simulation configuration setup details 


Total Virtual machines (VMs) Cloud [15,20,25] 


Computing power (MIPS) [2000:4000] 
RAM (MB) [5000:20000] 
Bandwidth (Mbps) [512:4096] 
Total VMs Fog [10,15,20] 
Computing power (MIPS) [2000:4000] 
RAM (MB) [250:5000] 
Bandwidth (Mbps) [128:1024] 


computer equipped with an Intel Core i7-8550U CPU, which possesses 8 
cores clocked between 1.80 and 2.0GHz. Supported by 16 GB of RAM 
and operating on the Windows 10 OS, this hardware configuration facili- 
tated efficient execution and evaluation of the algorithms, ensuring robust- 
ness and reliability in analyzing TS strategies within cloud-fog computing 
contexts. The investigation into algorithmic performance and deployment 
strategies was thoroughly validated using the CloudSim 3.0.3 simulator, 
extensively integrated with Java programming, to replicate cloud-fog com- 
puting scenarios. The experimentation took place on a personal comput- 
ing platform featuring an Intel Core i7-8550U CPU, boasting 8 cores with 
clock speeds ranging from 1.80 to 2.0 GHz. Supported by 16 GB of RAM 
and operating on the Windows 10 OS, this hardware configuration enabled 
rigorous evaluation of the algorithms, ensuring their effectiveness and 
dependability in orchestrating TS mechanisms within cloud-fog computing 
environments (Table 1.2). 

In configuring the simulations, a custom-developed cloud-fog computing 
simulator was employed, offering a sophisticated framework for replicat- 
ing dynamic TS scenarios. The simulator, implemented in CloudSim [24], 
enables the emulation of diverse network topologies, task workloads, and 
resource characteristics typical of cloud-fog environments. Key configura- 
tion details, including network topology, workload generation patterns, 
resource heterogeneity, and algorithmic parameters, were carefully chosen 
to ensure fidelity to real-world conditions. Through this comprehensive 
simulation setup, we aimed to delve into the intricacies of TS optimiza- 
tion within CFC environments and provide insights into the performance 
of CSO in addressing these challenges. 


1.4.1.1 Task completion time 


Our proposed CSOA demonstrates superior performance compared to 
traditional approaches such asAnt Colony Optimization (ACO) , Particle 
Swarm Optimization (PSO), and harmony search optimization (HSO) 
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Figure 1.3 Calculation of task completion time. 


when evaluated in terms of task completion time. Through extensive simu- 
lations and benchmarking, CSOA consistently outperforms these conven- 
tional algorithms by effectively leveraging the exploratory and exploitative 
capabilities inherent in the cuckoo search algorithm. By intelligently bal- 
ancing exploration and exploitation, CSOA exhibits a remarkable ability 
to converge toward near-optimal TS solutions swiftly, thereby minimizing 
task completion time and enhancing overall system efficiency. Moreover, 
CSOA’s adaptability to dynamic environments and robustness against pre- 
mature convergence further solidify its superiority over ACO, PSO, and 
HSO, positioning it as a promising optimization technique for addressing 
TS challenges in CFC environments (Figure 1.3). 


1.4.1.2 Resource utilization 


Our proposed CSOA outperforms traditional methodologies such as ACO, 
PSO, and HSO significantly in terms of resource utilization. Through rig- 
orous experimentation and comparative analysis, CSOA demonstrates 
superior efficiency in allocating tasks to available resources within cloud- 
fog computing environments. By leveraging the inherent exploratory and 
exploitative capabilities of the cuckoo search algorithm, CSOA optimally 
balances resource allocation, minimizing resource idle time, and maximiz- 
ing utilization rates. Due to its adaptive nature, CSOA can dynamically 
adapt scheduling decisions according to real-time fluctuations in resource 
availability and workload, thereby ensuring optimal resource utilization 
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Figure 1.4 Calculation of resource utilization. 


across varying operational scenarios. This characteristic positions CSOA 
as a resilient and potent solution for optimizing resource utilization within 
cloud-fog environments, surpassing the performance of ACO, PSO, and 
HSO in this crucial aspect of TS optimization (Figure 1.4). 


1.4.2 Discussion 


CSO demonstrates notable effectiveness in optimizing TS within cloud-fog 
environments. By leveraging the evolutionary principles taking cues from 
the brood parasitism observed in cuckoo species, CSO efficiently explores 
the solution space, leading to near-optimal task allocation and resource 
provisioning. This capability is notably apparent in its aptitude for harmo- 
nizing exploration and exploitation, leading to competitive performance 
regarding task completion time and resource utilization. 

Moreover, when compared to other optimization techniques such as 
genetic algorithm (GA), PSO, and ACO, CSO often exhibits superior per- 
formance. Its ability to handle complex, nonlinear optimization problems 
with high-dimensional search spaces makes it well-suited for the dynamic 
and heterogeneous nature of CFC environments. Additionally, CSO’s sim- 
plicity and ease of implementation contribute to its appeal as a practical 
solution for TS optimization in CFC. 

However, despite its strengths, CSO is not without limitations. One 
notable limitation is its sensitivity to parameter settings, including popu- 
lation size, discovery rate, and abandonment probability. Poorly chosen 
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parameter values may lead to premature convergence or suboptimal solu- 
tions, highlighting the importance of parameter tuning for optimal per- 
formance. Furthermore, CSO’s reliance on randomization and stochastic 
processes can result in non-deterministic behavior, making it challenging to 
predict its performance accurately in all scenarios. 

While CSO presents a promising approach for TS optimization in CFC, 
further research is needed to address its limitations and enhance its scal- 
ability, robustness, and adaptability to diverse operating conditions. By 
exploring techniques to improve parameter-tuning mechanisms, incorpo- 
rating domain-specific knowledge, and investigating hybrid optimization 
approaches, CSO can continue to evolve as a valuable tool for enhancing 
the efficiency and effectiveness of TS in CFC environments. 


1.4.2.1 Limitations 


1. Scalability Challenges: CSO may face scalability issues when dealing 
with large-scale IoT TS problems involving a high number of tasks 
and resources. With the escalation of the optimization problem’s 
size, CSO’s computational complexity could reach a point where it 
becomes impractical, resulting in lengthier optimization durations 
and heightened memory demands. 

2. Sensitivity to Parameter Tuning: The performance of CSO relies heav- 
ily on the meticulous selection of algorithmic parameters, such as 
the population size, step size, and abandonment probability. Finding 
the optimal parameter configuration for a given IoT TS scenario 
can be challenging and may require extensive experimentation and 
fine-tuning. 

. Lack of Adaptability to Dynamic Environments: CSO’s effective- 
ness may diminish in dynamic CFC environments where task char- 
acteristics, resource availability, and network conditions are subject 
to frequent fluctuations. The static nature of CSO might impede its 
capacity to swiftly adapt to shifting environmental conditions, pos- 
sibly resulting in suboptimal TS choices. 

4. Limited Handling of Heterogeneous Resources: CSO may struggle to 
effectively allocate tasks in environments with heterogeneous com- 
puting resources, such as varying processing capabilities, memory 
capacities, and energy profiles. In such scenarios, CSO may prioritize 
certain resources over others, leading to uneven resource utilization 
and potentially degraded system performance. 

5. Lack of Guarantees on Solution Quality: While CSO often converges 
to near-optimal solutions, there are no guarantees of optimality 
or convergence to the global optimum. Depending on the problem 
instance and parameter settings, CSO may converge to suboptimal 
solutions or become trapped in local optima, limiting its ability to find 
the best possible TS solution. 


(09) 
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Addressing these limitations requires further research and development 
efforts aimed at enhancing CSO’s scalability, adaptability, and robustness 
in the context of scheduling IoT tasks in CFC environments. Additionally, 
exploring hybrid optimization approaches and incorporating domain- 
specific knowledge may help mitigate these limitations and improve CSO’s 
effectiveness in real-world IoT deployment scenarios. 


1.5 CONCLUSION AND FUTURE WORK 


In conclusion, the application of CSO for TS in CFC holds significant 
promise and has garnered considerable attention within the research com- 
munity. Through its ability to efficiently explore the solution space and 
balance exploration with exploitation, CSO has demonstrated competi- 
tive performance in optimizing task completion time and resource utili- 
zation in dynamic cloud-fog environments. Despite its effectiveness, CSO 
is not without challenges, particularly its sensitivity to parameter settings 
and reliance on stochastic processes. Addressing these challenges through 
further research on parameter-tuning mechanisms, hybrid optimization 
approaches, and incorporation of domain-specific knowledge is essential to 
unlock the full potential of CSO for TS optimization in CFC. Overall, CSO 
represents a valuable tool in the quest to enhance the efficiency, scalability, 
and adaptability of TS algorithms, contributing to the advancement of CFC 
systems and their applications in various domains. 
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Chapter 2 


SDN-loT integration with 
network function virtualization 
for improved performance 


CH N. Santhosh Kumar, Babu Pandipati, 
Marepalli Radha, and Gouthami Velakanti 


2.1 INTRODUCTION 


The IoT allows for the interconnection of many devices, resulting in vast 
networks that may function independently and anywhere. A wide variety 
of Internet-connected accessories, including computers, cellphones, home 
appliances, industrial systems, e-health gadgets, surveillance equipment, 
sensors for precision agriculture, and more, make up this interconnected 
landscape. Forecasts indicate that, by 2020 [1], the number of these inter- 
connected devices would exceed 45 billion, with a monetary worth surpass- 
ing USD 14 billion. There will be a need to deploy more network access 
and core devices due to the massive volumes of data generated by all these 
devices. Numerous technological obstacles exist in ensuring the seamless 
operation and integration of such massive IoT systems. Issues with data 
collecting and analysis, privacy, security, the topology of IoT nodes, com- 
munication protocols, edge access, and application and device heterogene- 
ity are all part of these difficulties. In addition, re-establishing flows may be 
necessary due to dynamic topological changes introduced by the mobility 
of IoT devices [2]. An already complicated ecology is made much more so 
by the wide variety of applications. Virtualization and the programmabil- 
ity of software and hardware resources provide a practical way to reduce 
complexity, even though it may be impractical to create a single solution for 
all of these problems. Through the use of software, virtualization allows 
for the logical abstraction of a network’s underlying hardware elements. 
By decoupling control from hardware, this abstraction makes it easier to 
administer, update, and modify. New developments in virtualization have 
expanded its scope to include software integrated in hardware, which is 
now considered as a separate virtual function element [3]. 

The fast growth of the Internet has made the problems of heterogene- 
ity, scalability, and interoperability even more severe for conventional 
networks, which are notoriously rigid and unchanging. Network Function 
Virtualization (NFV) [4] and Software-Defined Networks (SDNs) [9] are 
two key virtualization techniques for communication networks that have 
evolved to tackle these difficulties. By turning the physical infrastructure 
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into a forwarding (data) plane and centralizing the control functions of 
routing devices, SDN completely changes the way networks are designed. 
Here, the controller is in charge of all policy and flow-related data, and 
the protocol that bridges the gap between the control and data planes is 
OpenFlow (OF) [5]. There are alternative ways that provide comparable 
functionality; these are cited in Refs. [6,7]. The ability to enforce configura- 
tions, regulations, and flows across the whole network is the main benefit 
of SDN. In addition to these benefits, SDN also allows for vendor indepen- 
dence, virtualization of network slices, improved security, and optimization 
of resource utilization. Conversely, NFV entails moving traditionally 
hardware-based tasks (such as firewall, load balancing, and path computa- 
tion) to a software-based cloud. By moving away from specialized physi- 
cal hardware and towards a more versatile virtualized environment, this 
approach improves flexibility and adaptability. 

SDN and NFV are two examples of new technologies being investigated 
for their potential to provide useful answers in this quest. In instance, SDN 
has attracted a lot of interest from academics and shown good results in 
data centre networks, where optimizing network and IT resources together 
is a main goal. The use of SDN by Google to manage the connections 
between its many data centres is a prime example [8]. A new paradigm in 
networking, SDN overcomes the shortcomings of conventional networks. It 
separates the control logic of a network from the forwarding plane, which 
was formerly tightly coupled. A network operating system or logically 
centralized controller realizes the control plane, which streamlines evolu- 
tion, configuration, and policy enforcement [9]. NFV provides a new way 
to programme the network, allowing the operator to automate the man- 
agement of data plane devices and optimize the use of network resources. 
Consequently, this improves the network’s performance in terms of data 
handling, control, and management [10]. 

A theoretical network design known as NFV uses software running 
on commercially accessible, off-the-shelf servers to substitute dedicated 
network equipment like switches, routers, and firewalls. Saving energy, 
optimizing load, and making networks more scalable are all advantages of 
this method. One or more virtual machines (VMs) running various appli- 
cations and processes spanning storage, network servers, switches, or even 
the cloud computing infrastructure make up the NFV architecture shown 
in Figure 2.2. As a result, specialized hardware appliances for certain net- 
work tasks are no longer necessary. SDN and NFV are complementary 
technologies. As a component of SDN, NFV virtualizes the SDN control- 
ler, making cloud deployment possible and allowing for dynamic controller 
movement to best-fit locations. By contrast, SDN enables NFV by providing 
customizable network connectivity across NF Vs, leading to improved traf- 
fic engineering. Despite their shared interests, NFV and SDN do not share 
a common architecture since they are products of separate standard orga- 
nizations. There are a number of proposed SDN designs for the Internet 
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of Things (IoT), but not all of them include a virtualized approach. Some 
examples include software-defined wireless network (SDWN), Sensor OF, 
software-defined networking wireless sensor network (SDN-WISE), and 
SDWN [11]. An early attempt to connect SDN, NFV, and the IoT is this 
work. IoT devices can share network resources in an effective and depend- 
able manner after this integrated method is implemented. This paper pres- 
ents an easy-to-understand SDN-IoI architecture that uses NFV to solve 
the scalability and mobility problems that plague the IoT. Improving net- 
work agility and efficiency for IoT applications is the desired result. 

Processing, correlating, and analysing raw data acquired from varied 
devices, including sensors, are crucial in IoT settings. Unfortunately, these 
procedures need to be carried out externally because these devices have lim- 
ited resources. For this reason, it is essential to combine AI with superior 
analytical capabilities in order to learn anything from the massive amounts 
of data sent by IoT devices. Cloud of Things and Everything as a Service are 
two unique applications that arise from this strategy [12]. One of the biggest 
obstacles in this situation is making sure the services are good. SDN and 
NFV are critical to the upkeep of service-level agreements (SLAs) from a 
Quality-of-Service (QoS) standpoint. Their adaptability allows for the con- 
trol and introduction of new network features or sensors in reaction to declin- 
ing QoS levels or customer demands for supplementary services. Both the 
service’s QoS and the end users’ Quality of Experience (QoE) are improved 
by this. Cuts to operating and capital expenditures (OPEX and CAPEX) will 
have a multiplicative effect on the service and telecom industries [13]. 

In this research, we present an IoT design that uses NFV and SDN to 
solve and show proofs of concept for QoS problems in these kinds of sce- 
narios. The document is structured into four sections, beginning with the 
current introduction. Section 2.2 outlines the related work. The subsequent 
section, Section 2.3, presents the methodology. Section 2.4 presents the 
results and discussion. The chapter concludes with Section 2.5. 


2.2 RELATED WORK 


SDN-based architectures for horizontal IoT services have been presented 
in studies [14,15] in order to deal with the different protocols that connect 
sensing and network domains. These designs make use of a gateway, which 
is an OF-based switch, to allow for the sending and receiving of instructions 
across many protocols. Converting protocols between the two domains is 
the major function of the gateway. Using a software-defined data plane, 
another study [16] suggested a method to connect two sensing domains. 
Here, the data plane plays the role of a bridge depending on the situa- 
tion, allowing the SDN model to grow to include Layer 7 packet manipu- 
lation capabilities via the extension of programming functions inside the 
OF standard. Deploying these techniques in large-scale network setups is 
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problematic since they demand more computer resources and memory at 
the gateways, despite their promise. Figures 2.1 and 2.2 describe the SDN 
network and interconnection of IoT devices. 
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Figure 2.1 Software-defined network architecture for Internet of Things platform and 
integrated Internet of Things gateways. 
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Figure 2.2 Interconnections among loT components. 
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There have been a number of recent proposals to tackle the problem of 
installing large software packages on IoT gateways. As an example, a study 
[17] proposed using Docker containers to build gateway services in massive 
networks. In order to manage software-defined IoT systems, other research, 
including [18,19], included agents on gateways. Automating push and pull 
provisioning on gateways was the primary focus of research [20], which 
centred on software creation and packaging on the server side. When it 
comes to deployment, setup, and maintenance, managing IoT gateways gets 
more complicated as the volume of IoT networks increases. 

To address this, VNFs, or Virtual Network Functions, are built using 
SDN and NFV. These VNFs are then saved as images or containers within 
VMs. Because of this, NFV orchestration may produce, activate, or change 
the status of specific instances or services operating on IoT gateways with 
ease. To build an IoT infrastructure that can deploy different network func- 
tionalities on IoT gateways, it is considered as a feasible and resilient option 
to combine NFV orchestrations with SDN controllers and cloud software 
development. Data storage, processing, and transmission technologies that 
reduce the quantity of data delivered to the core are imperative, given the 
large surge in IoT traffic across networks [21]. An important step forwards 
in IoT infrastructure is the use of SDN/NFV and platform-specific virtu- 
alization (P4-based) switches on IoT gateways. This allows for benefits 
including fabric end-to-end (E2E) connectivity, dynamic scaling, and data 
pre-processing at the gateways. In addition, studies in Refs. [22,23] explored 
the technical relationships between the development of the IoT, Big Data 
Analytics (Big Data), Cloud Computing (Cloud), and SDN in the future 5G 
era. According to these research, 5G networks can handle the varied needs 
of IoT applications and transport the massive volumes of data produced by 
them more quickly and cheaply than previous generations. Data processing 
and storage can be accomplished through the use of cloud computing and 
Big Data, and SDN/NFV can set up a scalable network for optimal transfer 
of big data. In order to streamline the process of setting up instances for 
every gateway, NFV plays a vital role. The goal of these 5G designs is to 
facilitate the smooth incorporation of these technologies into IoT networks 
and applications, thereby accelerating their development. In a large-scale 
IoT ecosystem, one use case is the offloading of IoT gateway services. 


2.3 METHODOLOGY 


In the context of Full-SDMN (Software-Defined Mobile Middleware) archi- 
tecture, the collaboration between NFV orchestration, Full-SDMN orches- 
tration, and SDN controller plays a vital role in ensuring that the following 
are met: for network providers, Full-SDMN orchestration means coming 
up with new service policies that let them monitor and regulate things like 
security, IoT apps, mobile virtual network operators (MV NOs), and content 
delivery networks (CDNs). Virtual core networks (VCNs) are created by 
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Figure 2.3 Full-SDMN architecture. 


Full-SDMN orchestration based on the services, which in turn build flows 
to programme the network components. The orchestration of NFV guaran- 
tees the creation and operation of these functions on the underlying network 
[24]. Lastly, SDN controllers manage the network’s physical components 
and operations. Crucial elements within the architecture include Home 
Gateways (GWs), situated at the edge network, functioning as gateways for 
various edge networks like 3GPP RAN (Radio Access Network), non-3GPP 
RAN, or wireline networks. As a result, Home GWs need to meet specific 
criteria regarding flexibility and scalability [25]. This involves the ability 
to create slices, manage and sustain the network, and host IoT services 
(see Figure 2.3). 

The authors go into detail about how various IoT companies roll out their 
services on a common network in the context of 5G for IoT applications. 
The three tenants in the multi-tenant IoT structure of Figure 2.4 repre- 
sent different IoT applications: smart homes (Tenant 1), automated vehi- 
cles (Tenant 2), and electronic health records (Tenant 3). Communication 
between vehicles and the network infrastructure is essential in the context 
of automation cars in order to carry out a number of services, such as pro- 
viding real-time traffic warnings, mapping, local weather updates, parking 
information, and popular news [26,27]. Due to the time-sensitive nature 
of these services, heterogeneous networks like 3GPP RANs, non-3GPP 
RANSs, or Wireless Sensor Networks (WSNs) can be used to connect cars. 
Optimal throughput and low latency may also be achieved through the use 
of novel air interfaces (multi-radio access technology (RAT)) that guarantee 
dependable connectivity via a VCN and Virtual Radio Access Network 
(VRAN) or adjacent Home Gateways linked with RANs. Applications such 
as v-3GPP, v-non3GPP, and v-multi-R AT controllers can create and admin- 
ister virtual local area networks (VRANSs). Likewise, a Multi-Tenants 
Controller Application manages and controls the construction of a VCN to 
provide consistent and reliable services. The implementation of a controller 
application for service recovery is done to handle the possibility of service 
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disruptions. Additionally, a Multi-Tenants Controller Application can set up 
a VCN channel and operate it in situations where vehicles need non-safety 
services, such as local weather updates and parking information, which are 
stored on servers within IoT services or on the Internet. This app ensures 
effective and customized service delivery by creating various network slices 
that are tailored to the individual requirements of the IoT applications as 
described in Figure 2.4. 

In the context of the e-healthcare scenario, a diverse array of intelligent 
devices, including smart sensors and actuators such as heartbeat sensors, 
body sensors, blood pressure devices, and wearable smart medical sensors, 
generate substantial amounts of data. This data encompasses information 
from sensors in WSNs like 6LoWPAN, ZigBee, non-3GPP RANSs such as 
Wi-Fi, LoRa WAN, Bluetooth Low Energy, as well as personal patient data, 
health conditions, and medical treatment histories [28]. Consequently, 
the e-health network is required to possess capabilities for effective com- 
munication, perception, data processing, analysis, and the conversion of 
physical data into tangible effects [9]. Improving healthcare services while 
decreasing healthcare expenses is the ultimate aim. The next scenario is 
the smart home, where, like e-healthcare, the IoT is being used to seam- 
lessly integrate smart sensors into smart home surroundings. The ultimate 
goal of this integration is to make home users’ lives easier by better moni- 
toring and coordinating the living space [29,30]. Central air conditioning, 
lighting, curtains, and heating systems, as well as other multimedia sys- 
tems like video intercoms, background music, and smart appliances, all 
incorporate smart sensors. Home GWs allow these devices to connect to 
remote services, interact over the IoT, and intelligently control other physi- 
cal devices for things like security, entertainment, energy saving, and safety. 
As gateways, home GWs are vital in the smart home environment, sending 
packets to their final destinations and separating services. 


2.4 RESULTS AND DISCUSSION 


The simulation findings centre on the behaviour of the Infrastructure 
and Control and Virtualization Layers to show that the suggested frame- 
work is feasible. Mininet emulations were used for this purpose [31]. The 
research community has mostly embraced Mininet as the SDN emulation. 
Virtual hosts, OF switches, and links are all components of Mininet, which 
employs python scripts to mimic user-defined network topologies. There 
is a direct correlation between the capacity of the underlying host system 
and the emulation performance in real-time applications. Using small-scale 
topologies ensures accurate results and memory/CPU separation in this sit- 
uation. A virtual computer running Linux Ubuntu 16.04 is used to replicate 
the topology on an Acer Swift 3 server (Intel i7, 2.7GHz, 8 GB). Here, 
the topology depicted uses core, aggregation, and edge OpenFlow-enabled 
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Figure 2.4 The architecture for Internet of Things framework. 


Layer 0 


switches (sl—s7) to simulate a shared data center. A controller called SDN 
Floodlight (Floodlight Controller) is linked to the switches. For the pur- 
pose of requesting the primary SDN services needed to deploy SDN-Apps, 
Floodlight offers a representational state transfer - application program- 
ming interface (REST-API). The Quality-of-Service-App [14] is utilized in 
this experiment. Two hosts, h1-h8, are linked to each edge switch (s4—s7). 
Virtual hosts create data for video streaming and stand in for IoT devices. 
Using real-time protocol over user datagram protocol (RTP/UDP), the 
virtual hosts broadcast the video file “highway cif” (Telecommunication 
Networks Group) over a VideoLAN client (VLC) server. The video’s resolu- 
tion is 352x288 and its file size is 4.23 MB. The format is highway cif.mp4, 
which is an MPEG4-encoded file. There are two thousand frames in the 
66-second video. The experiment’s goal is to demonstrate the controller’s 
capability to modify the behaviour of switches in a way that does not inter- 
fere with the network’s regular operation. The experiment makes use of 
two streaming flows: one delivers video from h2 to h7, while the other sends 
video from h1 to h8. The first stream does not have Quality-of-Service cer- 
tification. The two videos are sent over the network at the same time. The 
45-second streaming time is used to download and install the QoS-App on 
the controller. Afterwards, the QoS-App is set up automatically to regu- 
late the link bandwidth according to the flow. Data rates of up to 2 Mbps 
are allowed for w-QoS streaming (from h2 to h7) and up to 0.4 Mbps for 
w/o-QoS flow (from h1 to h8). Following this, the received streams are 
saved as individual video files. According to Ref. [32], the Evalvid Tool, 
developed by the Department of Telecommunication Systems, is used to 
analyse the processed files. After decoding the files to -yuv format, Evalvid 
compares the two streams for metrics like Structural Similarity Index 
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Metric (SSIM) and Peak Signal-to-Noise Ratio (PSNR). The simultaneous 
execution of multiple programmes in the same VMs (Mininet, Floodlight, 
VLC, Evalvid) may cause changes in CPU and memory resources, hence the 
Monte-Carlo approach is used. To account for these variances, the scenario 
is tested 20 times and the related average is analysed. 

The experiment results, depicted in Figures 2.5 and 2.6, show the 
PSNR average against the number of frames. The solid line represents 
the w-QoS streaming, while the dotted line represents the w/o-QoS. 
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Figure 2.5 Peak signal-to-noise ratio. 
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Figure 2.6 Structural similarity index metric. 
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Initially, both streaming flows exhibit similar behaviour (best effort), leading 
to comparable average PSNR values. The vertical black line indicates the 
point at which the QoS-App is downloaded and configured in the Floodlight 
controller. As anticipated, after the QoS-App configuration, the switches’ 
behaviour is automatically adjusted to identify flows and assign different 
QoS policies. Consequently, the h2-h7 traffic demonstrates superior PSNR 
levels compared to the hi-h8 flow. In this context, the average PSNR for 
w-QoS is 26.54dB, while the w/o-QoS stream averages 23.73 dB. 

A notable dip in the plot and the experiments reveal that this unexpected 
effect is primarily induced by fast-moving scenes during that period. The 
increased network load during these scenes leads to a decrease in PSNR. 

With w-QoS, the average SSIM is 0.814, whereas without w/o-QoS 
streaming, it is 0.738. The results show that the controller may adjust the 
network’s behaviour and data flow balance on the fly to meet user needs. 


2.5 CONCLUSION 


The present work examines the benefits brought about by SDN and NFV 
paradigms in IoT environments and proposes an SDN/NFV architecture 
tailored for IoT networks. The experiment evaluates the controller’s ability 
to dynamically manage network behaviour in the context of the ITSCO 
2018 — Special Session on IoT and Smart Communities. The test topology, 
based on Mininet, and the analysis of video streaming demonstrate that 
the floodlight controller has the capability to real-time modify the QoS/ 
QoE of different flows. Consequently, future research challenges involve 
finding a balance and orchestrating virtual resources in IoT environments. 
Additionally, optimizing algorithms for real-time streaming within SDN/ 
NFV architectures presents a significant challenge. 
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Chapter 3 


Decentralized trust 


A framework for ensuring data 
integrity in lol using blockchain 


R. Praveen Sam, A. Ravi Kumar, 
V. Subbaramaiah, and G. Anil Kumar 


3.1 INTRODUCTION 


The swift evolution of the Internet of Things (IoT) is fundamentally reshap- 
ing our daily lives [1]. With an increasing number of physical devices, such 
as smartphones, wearables, and vehicles, connecting to the Internet via 
embedded systems and sensors, substantial data can be gathered and trans- 
mitted to cloud computing systems for efficient data analysis and quicker 
decision-making. Additionally, these devices can execute tasks beyond 
human capabilities, exemplified by unmanned aerial vehicles, or drones, 
operating as a microcosm of IoT, undertaking diverse activities like package 
delivery, crop quality monitoring, and anomaly detection in farming [2]. 
However, as IoT expands, the heightened connectivity and growing com- 
plexity of computing infrastructure expose vulnerabilities to cyber-attacks. 
Some physical devices are situated in insecure environments, susceptible 
to tampering by hackers. The wireless sensor network facilitates the trans- 
mission of data and operational commands to the Internet, a potentially 
untrusted communication channel, making them prone to unauthorized 
alterations. Consequently, ensuring device authorizations and maintaining 
data provenance [3,4] emerges as a critical concern. 

Furthermore, numerous existing IoI systems depend on centralized com- 
munication models connecting to servers or cloud computing platforms that 
handle processing and data storage. The predicament here lies in the server 
becoming a bottleneck and a prime target for cyber-attacks [5]. It also serves 
as a potential point of failure that could disrupt the entire network, impact- 
ing data integrity. Therefore, the challenge persists in establishing a genu- 
inely trustworthy and integrated environment to support interconnected 
devices and computing infrastructure for secure data transfer. Improving 
communication security in a diversified environment can be achieved by 
creating a new framework based on blockchain technology for big data 
analytics inside the framework of smart city architecture. Blockchain is 
a revolutionary distributed ledger system that offers improved security 
due to its decentralized operation [6]. A chain of linked blocks containing 
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considerable data generated from transactions is created when fresh trans- 
actions are recorded in verified blocks. These blocks store a plethora of 
information, creating a detailed audit trail of every transaction. The dis- 
tributed ledger technology known as blockchain eliminates the need for a 
central authority to verify transactions. Traditional data analysis methods 
face a formidable obstacle in the form of big data, which is defined as the 
collection of massive measurements [7]. As real-time big data applications 
continue to grow in popularity, so does the need for predictive big data 
analytics. Cameras, infrastructure, and smartphone apps let local govern- 
ments monitor things like traffic, energy usage, and air pollution. Services, 
transportation, and public safety can all benefit from this data once it has 
been handled using technological methods [8]. Figure 3.1 shows the steps 
involved in collecting and processing data for smart cities using big data. 
In the contemporary landscape, diverse methods are employed for 
processing large volumes of transaction information online. Blockchain 
technology has proven to be highly effective in facilitating online data 
processing. Operating at a superior level, the distributed infrastructure 
of blockchain enables multiple remote accesses. In the course of transac- 
tions, data is kept in distinct databases by a number of different entities; 
the implementation of blockchain technology enables these groups to get 
access to a comprehensive system [9]. New research integrating block- 
chains with big data to improve smart city connectivity security is the 
main novelty. By developing an effective framework for communication 
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Figure 3.1 Assessing the efficacy of data integrity validation across the entire spectrum 
from Data Owner Application to Cloud Storage Service-Y. 
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among the many linked devices in smart cities, this study incorporates 
enhanced blockchain and big data components, marking a major step for- 
ward. This study differs from others in that it seeks to remedy problems 
with data transmission, validation, package width range, and the decrease 
in information exchange between node generations. The methodology 
employed is rigorous and scientifically implemented, leading to a revised 
framework that makes the extensive findings of our study reliably predict- 
able. Additionally, computational findings have undergone validation to 
ensure their accuracy. 

Blockchain, the technology underpinning Bitcoin, introduces a fully 
decentralized system, prompting subsequent endeavors to apply it in decen- 
tralizing existing Internet service infrastructures [10]. The blockchain 
system and data storage procedures have both benefited from the many 
efforts that have investigated its potential use for verifying the integrity 
of stored data since its inception. Retricoin [11] is one such project that 
aims to use Proof of Retrievability (PoR) instead of the energy-intensive 
Proof of Work (PoW) to verify data integrity and generate coins for big 
files. These methods indicate a bright future for completely decentralized 
data integrity assurance. Despite the optimistic outlook, the practicality of 
decentralized data storage with acceptable efficiency remains a challenge. 
Currently, cloud storage services are often considered, but, for data integ- 
rity services, exploring decentralized frameworks becomes worthwhile. 
This paper proposes a blockchain-based framework to facilitate decentral- 
ized data integrity verification for IoT data stored in semi-trusted cloud 
environments. The primary contributions of this paper can be succinctly 
outlined as follows. 


e The article substitutes the centralized node’s Integrity Management 
Service with a completely decentralized Data Integrity Service (DIS) 
based on blockchain. This removes the need for trust in third-party 
auditors (TPAs) and enhances the dependability of the DIS. 

e The document suggests protocols for verifying data integrity within a 
fully decentralized setting and presents a framework that allows both 
data owners (DOs) and data consumers to authenticate specific data 
without depending on any singular TPA. 

e The document illustrates the practicality of the suggested protocols 
and framework by creating a proof-of-concept demonstrator imple- 
mented on a private blockchain system. 


The remainder of the chapter is structured as follows: In Section 3.2, we 
conduct a review of relevant literature to offer insights into the motivation 
behind our research. Section 3.3 provides methodology. Section 3.4 out- 
lines the results and discussion. The research concludes with a summary 
and suggestions for potential future research in Section 3.5. 
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3.2 LITERATURE SURVEY 


In the context of cloud storage, ensuring the integrity of data becomes an 
essential component of the security strategy. This enables users to check the 
integrity of their data that is stored externally in an efficient manner. Here 
we give a rundown of what is already out there in terms of auditing systems, 
public verification, and blockchain authentication of data. 


3.2.1 Static model 


Numerous research endeavors have explored two fundamental static mod- 
els for auditing outsourced data in cloud storage [12]. In one instance, a 
provable data possession (PDP) model was introduced in Ref. [13], leverag- 
ing RSA-based homomorphic verifiable tags. However, this model exclu- 
sively addresses static data storage and lacks thorough safety analysis [14]. 
Despite supporting public verifiability (delegating verification to a third 
party) and block-less verification (verifying without retrieving the raw data 
block), the scheme’s extension to a scalable PDP version still falls short 
in fully supporting dynamic data verification [15,16]. It encounters limita- 
tions when users attempt to insert, modify, or delete blocks dynamically. 
Another model, the provable retrievability (PoR) model, was proposed in 
Ref. [17]. This model ensures the correct storage of data by cloud servers 
and efficient data retrieval for users, employing sampling and error-cor- 
recting codes. However, the PoR scheme assumes a fixed querying number 
for users, restricting its applicability to static data storage scenarios. An 
alternative recovery proof scheme was introduced in Ref. [18], employing 
the Boneh-Lynn-Shacham short signature scheme [19]. While efficient and 
compact, this scheme does not support dynamic data integrity verification. 

The PoR strategy places a more rigorous demand on data recoverabil- 
ity than the PDP scheme, which just checks data integrity. The PoR strat- 
egy improves data redundancy to withstand a given level of data loss or 
corruption by encoding and recovering data using error correction code 
algorithms. 


3.2.2 Dynamic model 


A dynamic PDP technique was presented by [20] to enable thorough 
dynamic data updates in the cloud for data integrity verification. This 
scheme involves splitting the data file into equal-length data blocks. 
For the purpose of integrity verification, each data block is given a tag. 
Modification, insertion, and deletion of data all occur on the smallest pos- 
sible unit, the data block. The scheme uses an authentication database with 
rank information to maintain and validate the legitimacy of these tags. 
There is no public verification support in this scheme, even though it can 
validate dynamic data. 
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There is no way to dynamically update the cloud data in the static mod- 
els; it stays static. Dynamic models give cloud DOs the power to make 
real-time adjustments to their data. However, a lot of storage, communica- 
tion, and processing power are required by both static and dynamic models. 


3.2.3 Public verification 


In most cases, allocating substantial computing and communication resources 
is necessary for data integrity verification. In order to lessen the impact on 
compute, communication, and storage resources, Data Owners (DOs) fre- 
quently assign the task of conducting verifications to a Third-Party Auditor 
(TPA) so that data users do not have to bear the overhead of managing and 
verifying the integrity of the data themselves. This delegation helps to ensure 
that the data remains trustworthy and secure, while minimizing the burden 
on the data users and allowing them to focus on utilizing the data rather than 
verifying it Refs. [21,22]. Having a TPA involved in an audit increases secu- 
rity concerns since the TPA can learn more about the outsourced data. As a 
result, having full faith in the TPA is not a given, which raises concerns about 
potential security and privacy risks. Consequently, protecting user data from 
the TPA is crucial, particularly for cloud storage of critical or secret material. 

There has been little progress in public verification efforts thus far. An 
effective method for public verification of the completeness of dynamic 
data was suggested by Ref. [23]. Unfortunately, this scheme’s communica- 
tion overhead during verification makes it unworkable. Using a data struc- 
ture known as the Range Information Authentication 2-3 tree, which can 
securely handle dynamic data, a comprehensive dynamic update capability 
was provided in Refs. [24,25]. When it comes to public verifiability [26], 
looked at ways to make public authentication more secure by processing out- 
sourced data before authentication, which would stop TPAs from stealing 
users’ data during public verification. The scheme has a high computational 
cost, but it was used in Ref. [27] to prevent TPAs from learning knowledge 
during verification using a random mask mechanism. Additionally, the cen- 
tralized auditing service in this scheme is vulnerable, as a breakdown in 
the centralized service could lead to a complete halt of the auditing service. 
In Ref. [28], a privacy-aware public auditing mechanism for shared cloud 
data was proposed, constructing a homomorphic verifiable group signa- 
ture [29]. However, this mechanism faces challenges, including potential 
interruptions in auditing services if the TPA is under attack. Furthermore, 
users could exploit security issues to manipulate compensation from Cloud 
Service Providers (CSPs). 

Global distributed file system construction has been the subject of much 
research, with mixed results. In the academic community, Andrew file sys- 
tem (AFS) is known as a very effective system. Over 100 million users can 
now be accommodated by industry platforms like BitTorrent, Napster, and 
KaZaA. The development of generic file systems that offer decentralized 
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distribution, low latency, and global coverage is an ongoing endeavor [30]. 
From the client-server paradigm, which was originally developed for dis- 
tributed systems with a smaller scale and a server with more processing 
power, the peer-to-peer (P2P) scheme evolved. Each node in a P2P network 
acts as both a client and a server, allowing for symmetric communication. 
Through the facilitation of direct communication between peers, P2P sys- 
tems overcome bandwidth constraints in file sharing. The scalability and 
efficiency of file sharing are greatly improved when peers share files in parts 
instead of all clients requesting them from a server at once. 


3.3 METHODOLOGY 


The four primary components of the proposed system are shown in 
Figure 3.2: Data Owner Application (DOA), Data Consumer Applications 
(DCAs), Cloud Storage Service (CSS), and Blockchain. There are two sub- 
types of CSSs: private and public. Assuming a single DOA exists to generate 
data and upload it to the CSS, we will use this assumption throughout the 
study. A number of DOAs and DCAs are required for data integrity detec- 
tion. Organizations that need the DIS can access it through the blockchain 
system, which requires them to launch a blockchain client on their own 
nodes. Any node can join or exit the blockchain network at any time. 
Although the Cloud can also operate as a blockchain node in actuality, 
for the sake of simplicity in our proposed service structure, CSS specifically 
refers to CSS. Smart contracts hosted on the blockchain enable the DIS 
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Figure 3.2 Contrasting the act of composing messages to DISSC originating from various 
Departure Operations Areas (DOA). 
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to be implemented. Improved DIS efficiency and dependability are results 
of this decentralized deployment. We make a number of assumptions to 
address concerns regarding the efficiency and security of the blockchain 
system, which have a major influence on our solution. The first presump- 
tion is that, since every node is in it for themselves, a 51% attack [31] is 
very unlikely to succeed. Consider the Bitcoin blockchain as an example; 
a hostile actor would be less likely to launch a 51% attack and more likely 
to engage in mining. The second premise is that reaching a consensus on a 
blockchain does not take long. Current blockchain consensus periods may 
be longer (16-18 seconds per block) or even fail under poor network con- 
ditions; however, Ethereum’s blockchain is expected to enable consensus 
within an average length of 12.6 seconds. However, improvements in net- 
work and consensus algorithms may one day allow for the implementation 
of a consistently short time consensus. 


3.3.1 Blockchain 


Blockchain technology is the backbone of our proposed DIS, or Data 
Integrity Service. When the blockchain is first being set up, it is necessary 
for DOAs and DCAs to both join the network. This is the first step in creat- 
ing a key pair, the public key of which will represent the node’s blockchain 
account. In order to complete a transaction, the account must have enough 
gas, and the secret key is required to access the account. Assumption: The 
blockchain system can continue to function with a sufficient number of 
nodes ready to act as miners. 

We implement a pay-per-transaction approach for the DIS by integrating 
blockchain into our platform. Only when the DOA needs to communicate 
with the smart contract does gas get used up. We found that this method 
greatly improved the DIS’s flexibility when compared to our previous study, 
which used the cloud Information Management System (IMS) model [32]. 
Due to processing limits, DOAs may find it superfluous and tough to obtain 
gas by acting as miners. Thus, in our model, CSPs can continue to make 
money by acting as blockchain miners and earning gas, so there is no loss 
of profit for them. Soon, CSPs will be able to transact with DOAs using the 
gas they have earned. Then, DOAs can use their data to barter with DCAs 
for gas or incentives. DCAs can evaluate their hardware capabilities and 
financial situation to determine if they want to participate as miners. 


3.3.2 Data Integrity Service (DIS) 


A smart contract is used to implement the DIS. Before data can be securely 
stored in the blockchain by means of a smart contract, it must first be 
encrypted locally. All on-chain transactions involving the smart contract 
may be transparently audited because each party’s account is used to engage 
with it. A node’s blockchain data is synchronized with the entire blockchain 
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network once the blockchain service (BS) is begun. When DOAs utilize the 
smart contract to insert data into the blockchain, the data does not become 
valid and available to other nodes until the blockchain reaches consensus. 
Read operations from the blockchain by DCAs via the smart contract, how- 
ever, are lightning fast since DCAs are effectively reading data from their 
locally synced datasets. In comparison to the cloud-based IMS that was sug- 
gested in our earlier study, this feature makes our DIS more efficient [11]. DIS 
accessibility and flexibility are guaranteed with the deployment of the smart 
contract, which allows participants to engage with it at any moment. The 
only person or entity authorized to cancel or alter the DIS on the blockchain 
is the author. It is accessible at all times so long as the blockchain is running. 
The process of shutting down the blockchain system is made more compli- 
cated because, unlike services hosted by centralized providers, all participat- 
ing nodes must stop providing BSs. Our proposed blockchain-based DIS 
outperforms the cloud-based IMS [11] in terms of efficiency and assurance. 

Cloud Storage Service: Every major cloud computing service provider, 
including Amazon $3, IBM BlueMix, Microsoft Azure, and Digital Ocean, 
offers data storage services. These services are designed to cater to clients’ 
economic capacities and application needs, providing flexible cloud storage 
solutions. In our framework, the CSS serves as a versatile data storage solu- 
tion for DOs. Simultaneously, the Peer-to-Peer file system (P2PFS) facili- 
tates data sharing between DOs and data consumers. 


3.4 RESULTS AND DISCUSSION 


The implementation of the proposed system is described in detail in this 
section. Our model is based on the proposed service framework, which is 
shown in Figure 3.3 for its complex structure. DOAs are compatible with 
both desktop computers and mobile devices connected to the Internet. 
Tasks such as data generation, uploading to CSS, hash generation, encryp- 
tion of the resulting hash, and writing to the Data Integrity Service on the 
Blockchain (DISSC) are responsibilities of the devices’ DOAs. At the same 
time, data integrity verification is done using DOAs on PCs. Desktop 
and cloud-based DCAs are required to be able to execute the P2PFS and 
the BS. BSs are definitely within the capabilities of modern PCs. We talk 
about single-board computers with General Purpose Inputs/Outputs 
(GPIOs) when we talk about IoT devices that gather data from sensors. 
Such features are included in many modern single-board computers. One 
example is the Raspberry Pi. Before sending data blocks to the CSS, IoT 
devices can process them directly from sensors. Ethereum is the most 
developed blockchain platform that currently supports smart contracts, 
hence it is used to create the blockchain system. One effort to transfer 
files in a manner similar to HyperText Transfer Protocol (HTTP) is IPFS 
(Inter-Planetary File System), which is used to implement the distributed 
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Figure 3.3 Time taken by DISSC to respond to both Data Owner Applications and Data 
Consumer Applications. 


P2PFS. Protocol II and Protocol IV data verification is made easier with 
the use of IPFS. 

In order to test the feasibility of our framework and related protocols, 
we have built a prototype system. Due to the ever-changing nature of IoT 
data, we assume that it is stored in data blocks of varying sizes, and that 
these blocks are combined to generate datasets. In most cases, you have 
to check the integrity of certain data blocks before you can guarantee the 
whole dataset is secure. Our current working system is built on a private 
blockchain that has at least four nodes. One of these nodes is in charge of 
hosting P2PFS, the blockchain, and CSS. 

A personal computer is dual-tasked with running both DOA and DCA. 
Meanwhile, an IoT device is dedicated to running DOA for tasks such 
as data block generation and uploading to the CSS. However, a public 
cloud is assigned the role of running DCA, specifically to assess the effi- 
ciency of data block downloads under various network conditions. In 
Figure 3.1, the efficiency of verifying Data Integrity from DOA to CSS-Y 
is illustrated. The comparison results in Figure 3.2 indicate a substan- 
tially shorter time used by PC 1 compared to Pi2. This discrepancy can 
be attributed to at least two factors: differences in computation capabil- 
ity and variations in the Ethereum client version. Figures 3.3 and 3.4 
depict the time spent by different DOAs and DCAs in querying the Data 
Integrity Service on the Blockchain (DISSC). Additionally, Figure 3.4 
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Figure 3.4 Duration required for the retrieval of data blocks. 


outlines the time spent retrieving data blocks of different sizes in the 
CSS-N model. The test results demonstrate that our proposed framework 
can effectively support the integrity verification of data blocks by mul- 
tiple DOAs or DCAs. 


3.5 CONCLUSION 


Cloud data security revolves around three fundamental aspects: confiden- 
tiality, integrity, and availability, commonly known as CIA. In our earlier 
research, we introduced the concept of Data Integrity as a Service to address 
concerns related to data integrity in CSSs. A notable drawback of existing 
techniques is the reliance on a trusted third-party authority for completing 
data integrity verification tasks. However, this assumption may not always 
hold true, leading to potentially untrustworthy results in data integrity 
verification. This paper introduces efforts to implement a blockchain-based 
DIS, providing several advantages over previous works. It enhances reli- 
ability, as the service cannot be terminated by a single cloud party. The 
efficiency of data integrity verification improves with an increasing number 
of clients. The proposed framework supports the trading of data with data 
consumers and implements a pay-per-transaction model for DIS. 
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Chapter 4 


Blockchain’s impetus 
for secure loT-enabled 
applications in smart city 


Wasswa Shafik 


4.1 INTRODUCTION 


Smart cities symbolize a standard change in urban planning and admin- 
istration, leveraging modern technology and information to enhance the 
wellness of their residents and foster lasting financial and social progression 
[1]. The intensifying process of urbanization in our culture, in which the 
majority of the international populace currently lives in city locations, has 
actually developed a pressing need for innovative techniques to deal with 
the complex concerns connected with city living [2]. Smart cities use an 
aggressive option to these troubles, advertising a unique stage of metropoli- 
tan progression noted by performance, link, and sustainability. 

The essential concept underlying the smart city idea is integrating 
Information Communication Technologies (ICT) and the Web of Points 
(IoT) right into the real structure of the city framework utilizing the large 
information age. This combination helps with real-time information col- 
lection from a huge selection of resources, such as sensing units, electronic 
cameras, and various other IoT tools [3]. Because of this, it institutes an 
all-inclusive electronic semantic network, boosting decision-making, 
enhancing source allowance, and boosting the alternative metropolitan 
experience. Leveraging this large storage tank of info, communities hold 
the possibility to improve the quality and efficiency of civil services, cover- 
ing different domain names like transport, medical care, power administra- 
tion, and environmental durability [4]. 

The adaptation of these sophisticated modern technologies perfectly 
leads the way for building a comprehensive electronic structure that records 
instant information from varied beginnings, as provided in Figure 4.1, 
such as sensing units, cams, and a myriad of IoT gadgets [5]. Integrating 
Blockchain modern technology provides a durable device, protecting the 
sacredness and privacy of information, thus expertly securing it versus any 
immoral changes or violations. Consequently, this step-by-step harmony 
equips important understandings for educated, data-driven decision-mak- 
ing, cautious source appropriation, and improving the alternative metro- 
politan atmosphere [6]. 
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Figure 4.1 A cohesive range of bids for smart cities. 


A varied range of innovations, from ecological sensing units to smart traffic 
signals, comprises IoT tools, jointly accumulating substantial information. 
This information, consequently, returns very useful understandings right 
into diverse facets of city presence. Guaranteeing this information’s stabil- 
ity, exposure, and discretion is its protected storage space on a Blockchain 
system. Basic to Blockchain modern technology, wise agreements effort- 
lessly promote automated, protected, and clear purchases and contracts [7]. 
Their applicability covers numerous domain names, consisting of power 
usage, waste administration, and public transportation solutions. 

The wise city principle is based on incorporation and sustainability, 
focusing on reasonable and equivalent accessibility to innovation and solu- 
tions for all neighborhood participants. The ability of modern Blockchain 
technology to approve people’s authority over their information and deals 
remains in conformity with the concepts of addition and personal privacy 
[8]. IoT-enabled gadgets have the prospective to proactively consist of people 
in decision-making procedures, allowing them to voluntarily add their infor- 
mation in return for boosted solutions, all while guarding their privacy [9]. 

The requirement for safe information administration is vital in the con- 
text of wise cities, where huge quantities of information are produced and 
made use of for metropolitan development and administration. Largely, 
it ensures the conservation of delicate individual and public information, 
thus promoting the personal privacy and safety of people. Information is 
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collected from numerous resources in a smart city, including IoT gadgets, 
safety and security electronic cameras, and systems that promote resident 
communication [10]. The application of durable information safety meth- 
ods, consisting of file encryption, accessibility controls, and secure storage 
space, is needed to minimize unapproved accessibility dangers, cyberat- 
tacks, and information violations that might endanger the personal privacy 
of people and the total stability of the smart city community. 

Furthermore, applying durable information monitoring methods culti- 
vates self-confidence among residents and stakeholders, and advertising 
enhanced involvement in data-sharing ventures is important for improving 
metropolitan solutions and facilities. The reasoning behind incorporating 
Blockchain modern technology in smart cities is based upon its capability to 
boost information protection and openness and count on city advancement 
and federal government [11]. Smart cities hinge on substantial networks of 
IoT tools, sensing units, and information collection terminals, which pro- 
duce significant quantities of information relating to power intake, trans- 
port, civil services, and other elements. 

The decentralized and unalterable journal of Blockchain modern technol- 
ogy maintains information’s honesty and personal privacy, efficiently avoid- 
ing any adjustment or prohibited gain access. This method not only grows 
boosted guarantee among citizens and stakeholders concerning information 
accuracy but also enhances the application of data-driven decision-making 
treatments [12]. Using modern Blockchain technology allows the assistance 
of safe, clear, and auditable deals via smart agreements, therefore improv- 
ing the performance of city procedures, advertising cost-efficient solutions, 
and promoting count on electronic communications [13]. Subsequently, this 
improvement adds to understanding the smart city principle, identified by 
lasting, effective, and comprehensive metropolitan living. 


4.1.1 The chapter contribution 


This study contributes the following as summarized: 


e The study elucidates the profound impact of the IoT on the transfor- 
mation of urban landscapes. It also scrutinizes the diverse range of 
applications of the IoT in smart cities, explicitly exploring the utiliza- 
tion of IoT in areas like traffic control, waste management, and energy 
efficiency. 

e It provides an overview of Blockchain technology, including its prin- 
ciples and features, explaining the concept of decentralization, immu- 
tability, and consensus mechanisms in Blockchain. 

e It discusses how Blockchain technology can enhance the security 
and reliability of IoT systems by exploring use cases of integrating 
Blockchain with IoT in smart cities, such as data integrity, device 
authentication, and secure transactions. 


Blockchain’s impetus for secure loT-enabled applications 51 


e It highlights the security and privacy challenges in IoT-enabled smart 
cities, enlightening on how Blockchain can address these concerns, 
including data encryption and access control. 

e It demonstrates the security and privacy challenges in IoT-enabled 
smart cities, detailing how Blockchain can address these concerns, 
including data encryption and access control. 

e It identifies the challenges and limitations of using Blockchain in 
smart cities, addressing scalability issues, energy consumption, and 
regulatory concerns. 

e Finally, it ventures into the future of Blockchain and IoT in smart cit- 
ies, discussing potential advancements and emerging trends. 


4.1.2 The chapter organization 


Section 4.2 presents the IoT in smart cities, explains the significance of IoT in 
transforming urban environments, and discusses the various applications of 
IoT in smart cities. Section 4.3 provides an overview of Blockchain technol- 
ogy, including its principles and features, and illustrates the concept of decen- 
tralization, immutability, and consensus mechanisms in Blockchain. Section 
4.4 describes the security and privacy challenges in IoT-enabled smart cities 
and elucidates how Blockchain can address these concerns, including data 
encryption and access control. Section 4.5 identifies the challenges and limi- 
tations of using Blockchain in smart cities and addresses scalability issues, 
energy consumption, and regulatory concerns. Section 4.6 speculates on the 
future of Blockchain and IoT in smart cities and discusses potential advance- 
ments and emerging trends. Finally, Section 4.7 presents the conclusion. 


4.2 INTERNET OF THINGS AND 
BLOCKCHAIN IN SMART CITIES 


This section clarifies the relevance of IoT in remodeling lasting city and 
town settings with an easy conversation of the numerous applications of 
IoT in wise cities. 


4.2.1 Internet of Things in smart cities 


IoT matters in improving city settings by changing just how cities are con- 
structed, carried out, and experienced. A number of critical variables show 
its impact, as listed below. 


4.2.1.1 Data-driven decision-making 


Making use of the IoT in information events and evaluations offers city 
authorities the ability to choose based on empirical proof, hence boosting 
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the efficiency of urban planning and monitoring. By continuously checking 
and assessing information accumulated from different sensing units and 
gadgets, urban places quickly respond to vibrant scenarios, such as enhanc- 
ing website traffic patterns to reduce blockage or reapportioning sources 
to locations needing prompt focus [13]. A data-driven technique makes it 
possible for metropolitan locations to react to vibrant difficulties quickly, 
therefore improving the general health of its residents. 


4.2.1.2 Efficiency and sustainability 


The IoT plays an important function in improving metropolitan sustain- 
ability with its capability to enhance source allotment efficiently. One 
instance of the possible advantages of wise waste monitoring systems 
is the capacity to enhance waste collection rounds by uniquely accu- 
mulating containers that have reached their complete ability [14]. This 
technique causes price financial savings and lowers gas use and car- 
bon discharges. In a comparable capillary, a smart illumination system 
can regulate brightness levels adhering to the bordering light setting, 
therefore minimizing power usage. The performance enhancements are 
directly associated with sustainability, as they efficiently reduce a city’s 
environmental influence and minimize the source problem, thus cultivat- 
ing an ecologically mindful city setup [15]. 


4.2.1.3 Improved public services 


The usage of the IoT in civil services causes a considerable improvement in 
how cities deal with the demands of their locals. The application of real-time 
information obtained from IoT sensing units allows the constant surveil- 
lance of top-quality water, thus assisting in the very early discovery of pro- 
spective worries and making certain safe alcohol consumption water [16]. 
Likewise, applying smart sensing units within fire hydrants autonomously 
informs pertinent authorities of possible emergencies, boosting the effec- 
tiveness of emergency action procedures. The improvements to civil services 
inevitably add to an enhanced lifestyle for metropolitan individuals as they 
experience the benefits of faster, more reputable, and more secure solutions 
[17]. Figure 4.2 shows estimations of IoTs and non-IoTs from 2015 to 2025. 


4.2.1.4 Smart transportation 


The IoT has actually produced an advanced makeover in the transport 
market, with its extensive results varying from maximizing web traffic 
circulation to boosting public transportation systems. Smart web traf- 
fic monitoring systems make use of information accumulated from a 
wide variety of sensing units to adjust traffic signals and reroute website 
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Figure 4.2 Total active device (non-Internet of Things and Internet of Things) connec- 
tions worldwide [18]. 


traffic dynamically, therefore reducing blockage and lessening commute 
periods [17]. In addition, incorporating IoT innovation right into public 
transportation systems supplies travelers with prompt and current info, 
therefore boosting the effectiveness and ease of city traveling. Smart 
transport plays an important duty in affecting the future of metropolitan 
atmospheres by reducing traffic jams and motivating ecologically lasting 
transport choices [19]. 


4.2.1.5 Environmental monitoring 


The IoT substantially resolves vital environmental monitoring worries, specif- 
ically in air and water top quality. Sensing units purposefully released around 
the city landscape constantly accumulate information concerning pollutants 
and environmental criteria. In circumstances where limits are gone beyond, 
governmental entities can, without delay, apply actions to reduce the damag- 
ing ecological repercussions and secure the wellness of the basic population 
[20]. The constant tracking of ecological problems offers the twin objective 
of protecting the natural environments and guarding the well-being of metro- 
politan citizens, thus coming to be an essential aspect in establishing a much 
healthier and much more lasting metropolitan atmosphere. 
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4.2.1.6 Security and enhanced safety 


Carrying out IoT innovation has actually been found to boost the degrees 
of safety and security in cities dramatically. This is generally accomplished 
via the application of IoT applications in the domain names of security and 
emergency feedback. These cams, without delay, transfer real-time signals 
to police, allowing them to react quickly and successfully to prospective 
dangers. Moreover, incorporating IoT tools in emergency feedback systems 
enables specific recognition of occasion places and promotes the effec- 
tive allowance of sources [21]. These developments not only minimize the 
improvement of physical security but additionally grow an assumption of 
safety and security and general wellness among occupants of cities. 


4.2.1.7 Sustainable citizen engagement 


IoT equips people to be energetic individuals in their city settings. Mobile 
applications, internet systems, and neighborhood involvement devices 
permit homeowners to report concerns, access real-time details, and join 
decision-making procedures. This involvement develops a much more com- 
prehensive city setting where citizens have a straight risk of fitting their 
cities and enhancing the top quality of services, promoting a feeling of area 
participation and possession [22]. 


4.2.1.8 Sustainable cost savings 


The long-lasting cost-saving possibility of the IoT offers a noteworthy ben- 
efit for urban areas. Executing effective source allowance techniques and 
the automation of procedures, such as anticipating upkeep for necessary 
framework, can produce substantial expense decreases in the future. Cities 
can successfully reduce the economic worry and functional hold-ups related 
to emergency repair services and solution disturbances by taking positive 
steps to recover facilities prior to getting to a state of failure [23]. In a com- 
parable capillary, applying energy-efficient illumination and waste monitor- 
ing systems decreases functional expenditures, therefore helping with the 
extra efficient appropriation of sources by cities. 


4.2.2 Blockchain technology in smart cities 


Within this subsection, a summary of modern Blockchain technology, 
including its concepts and attributes, is supplied. It clarifies the principle 
of decentralization, immutability, and agreement devices in Blockchain. 
Using Blockchain innovation reinvents countless sectors with its capabil- 
ity to enhance openness, strengthen protection actions, maximize func- 
tional effectiveness, minimize expenditures, and grow trust funds [24]. 
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The concepts and functions of Blockchain have substantial effects on the 
future of information monitoring and electronic trust funds, as seen by its 
noteworthy application in cryptocurrencies. 


4.2.2.1 Core principles of Blockchain technology 


Blockchain innovation works on a decentralized local area network, typi- 
cally represented as nodes. Unlike traditional systems that rely upon a main 
authority or intermediary to validate and videotape deals, Blockchain inno- 
vation runs via a decentralized network of individuals. 


4.2.2.1.1 Decentralization 


The procedure of decentralization offers many benefits. Originally, making 
use of Blockchain innovation improves openness by making sure that all 
individuals within the network have equivalent accessibility to a common 
journal that is continually upgraded in real time. Moreover, it reinforces 
safety by removing any possible failing factors [24]. Rather than depending 
upon a particular entity, the confirmation of purchases is attained with the 
network’s agreement, therefore considerably hampering the capability of a 
harmful star to regulate the system. 


4.2.2.1.2 Distributed ledger 


The basic concept underlying Blockchain innovation focuses on a decentral- 
ized journal. The journal acts as a detailed document of all purchases inside 
the network, arranged back to back via the development of blocks, where 
each block includes a collection of purchases. Linking blocks to develop a 
consecutive collection is in charge of the term “Blockchain.” Every person 
within the network keeps the same reproduction of this journal [25]. The dis- 
persed journal system ensures consistent accessibility to the deal background 
for all individuals, therefore dramatically hindering any efforts to change or 
remove information without getting agreement from the entire network. 


4.2.2.1.3 Immutability 


As soon as a deal is recorded in a block and added to the Blockchain, it ends 
up being exceptionally strenuous to modify or eliminate. The immutability 
of Blockchain innovation can be credited to the release of cryptographic 
hash features. Every block makes up a collection of deals and a special cryp- 
tographic hash of the coming before block. The procedure creates a safe 
series of interconnected blocks that, as soon as built, show a high level of 
resistance to any adjustment tries [26]. Customizing a solitary block requires 
modifying all adhering to blocks, a computationally not practical treatment. 
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4.2.3 Key features of Blockchain technology 
4.2.3.1 Security and transparency 


The high degree of safety and security in Blockchain is credited to its decen- 
tralized and unalterable qualities. Changing a solitary block would certainly 
demand the alteration of all succeeding blocks on the Blockchain, which is a 
computationally impractical job. Making use of modern Blockchain technol- 
ogy improves its durability versus illegal tasks and cyberpunk efforts. The 
integral openness of Blockchain innovation allows all individuals to gain 
access to and observe the total purchase background [27]. Although specific 
purchases are pseudonymous, implying alphanumeric addresses recognize 
them, the information continues to be available and proven. This attribute 
can be useful for objectives like bookkeeping and guaranteeing responsibility. 


4.2.3.2 Smart contracts and speed and efficiency 


The usage of modern Blockchain technology makes it possible to assist with 
wise agreements, which are agreements that are self-executing and have 
terms that are clearly inscribed right into the underlying code. These legal 
contracts are made to implement autonomously upon the gratification of 
pre-programmed situations, therefore reducing the need for intermediaries 
in varied functional treatments [28]. The application of modern Blockchain 
technology can substantially boost the speed and performance of deals, 
especially in cross-border repayments and supply chain administration. 


4.2.3.3 Accessibility 


The pledge of modern Blockchain technology expands past financing and 
causes revolutionary changes in a number of locations, such as health care, 
supply chain monitoring, electing systems, and others. The concepts of 
decentralization, immutability, and openness, paired with qualities such as 
wise agreements, add to the reinforcement of safety and security, effective- 
ness, and liability in varied applications [28]. Nonetheless, it is critical to 
challenge challenges such as scalability, power effectiveness, and governing 
structures to realize Blockchain modern technology’s abilities totally. 


4.3 INTEGRATION OF BLOCKCHAIN AND 
SECURE IOT APPLICATION 


In the 21st century, wise cities have become a transformative force in con- 
temporary urban planning. The intensifying international urbanization 
fad needs cutting-edge options to deal with the difficulties of traditional 
metropolitan facilities. Difficulties such as ineffective source monitoring, 
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quick population growth, raised source needs, safety and personal pri- 
vacy issues, and the importance of sustainability have actually pressed 
cities to discover sophisticated modern technologies [29]. Among these, 
Blockchain and IoT have actually appeared as appealing modern tech- 
nologies, holding enormous possibilities to change smart city growth. 
Incorporating Blockchain and the IoT in smart cities stands for a standard 
change in urban planning and growth. We discover the combinations of 
these two transformative modern technologies in this area, diving right 
into their stamina and checking out just how their assimilation can deal 
with crucial obstacles traditional city facilities deal with. With a thorough 
evaluation of the advantages, this area intends to supply an understanding 
of the extensive influence of joining Blockchain and IoT together in wise 
cities [30]. 


4.3.1 Decentralized data management 


Blockchain stands for an important structure for decentralized and tam- 
per-proof information monitoring in smart cities. Incorporating it with 
IoT warranties that all deals and information originating from IoT gad- 
gets are firmly tape-recorded on an unalterable journal, therefore guard- 
ing information honesty and preventing unapproved adjustments from 
harmful stars [31] Provided the extensive real-time information created 
by IoT sensing units and tools, Blockchain assists in safe administration 
without dependence on a main authority, mitigating information adjust- 
ment danger. 


4.3.2 Secure identity and access management 


Blockchain’s safe and decentralized identification administration ability 
is a foundation for boosting protection and personal privacy within the 
wise city community (Khalil et al., 2022). An increased degree of safety is 
attained by enabling residents, gadgets, and entities to save their identifica- 
tions on the Blockchain. This assimilation additionally addresses IoT gad- 
gets’ verification and permission demands, giving a durable structure for 
taking care of identifications and accessibility approvals [32]. This strategy 
makes certain safe and secure communications within the smart city net- 
work, strengthening the system’s total durability. 


4.3.3 Smart contracts for automation 


Incorporating Blockchain and IoT presents smart agreements that auto- 
mate and implement predefined guidelines, removing the demand for 
intermediaries. In wise cities, these agreements are essential in automating 
varied procedures such as energy settlements, waste administration, and 
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traffic control. IoT tools with sensing units and actuators can cause occa- 
sions based on real-world information. As an example, integrating IoT and 
Blockchain-based agreements in tourist financing can improve procedures, 
instantly performing activities when details problems are fulfilled [33]. This 
assimilation improves performance and makes certain lasting economic 
techniques in the tourist sector. 


4.3.4 Supply chain transparency 


Blockchain’s clear and deducible journal locates applications in supply chain 
monitoring, making sure the safe and secure recording of deals connected to 
manufacturing, transport, and circulation of items. The harmony with IoT 
sensing units in the supply chain allows real-time information generation on 
items’ place, problem, and standing [34]. When combined with Blockchain, 
this information warrants openness and liability, alleviating the threat of 
scams and maximizing the total effectiveness of the supply chain. 


4.3.5 Energy trading, data security and privacy 


Blockchain militarizes peer-to-peer power trading, developing a decentral- 
ized and clear method for power purchases. In the IoT world, wise meters 
and sensing units keep track of power usage and manufacturing in struc- 
tures. Incorporating this information with Blockchain makes it possible to 
make exact and clear payments and apply vibrant rate designs [35]. This 
urges power preservation and advertises making use of renewable resource 
resources, largely inside your home. The protected and decentralized 
Blockchain design boosts information safety and security, with crypto- 
graphic attributes making sure encrypted gain access just for licensed cele- 
brations. The level of sensitivity of information accumulated by IoT gadgets 
makes protecting personal privacy extremely important [36]. Blockchain 
offers a decentralized control system, considerably lowering the threat of 
unapproved gain access to or meddling. 


4.3.6 Citizen-centric services 


Blockchain makes it possible for clear and protected resident solutions, 
extending tasks like a ballot, building enrollment, and health care docu- 
ments. Blockchain’s openness improves reliance on federal government 
solutions by equipping residents with better control over their informa- 
tion. Person interaction is additionally boosted with IoT tools accumulating 
comments and real-time details, with Blockchain combination ensuring the 
credibility of person input [37]. This cultivates even more answerable and 
receptive administration. 
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4.3.7 Real-time monitoring and analytics 


As a safe and clear structure, Blockchain helps with the recording and shar- 
ing of real-time information. This comes to be specifically valuable in applica- 
tions where information precision and stability are extremely important. IoT 
gadgets continually create real-time information on different city procedures. 
Combination with Blockchain allows the application of this information for 
analytics, decision-making, and optimization of city solutions, specifically in 
smart health care [38]. Therefore, cooperation between federal government 
companies, capitalism, and innovation companies is necessary to incorporate 
Blockchain and IoT in smart cities efficiently. Specifications and procedures 
for interoperability, safety, and information personal privacy should be devel- 
oped to ensure a smooth and safe smart city ecological community. 


4.4 IOT SECURITY AND PRIVACY CONCERNS 


Incorporating Blockchain and IoT in wise cities ensures the transforma- 
tion of information administration and administration. Nevertheless, this 
harmony presents a range of safety and security and personal privacy dif- 
ficulties needing precise exams. This area divides these difficulties, using an 
academic expedition of the possible mistakes and needed safeguards. 


4.4.1 Immutable data and access controls 


The fundamental immutability of modern Blockchain technology, which 
ensures information honesty, comes to be a double-edged sword when tak- 
ing into consideration the assimilation of Blockchain and IoT in smart cities. 
While this function safeguards versus information meddling, it increases 
considerable problems pertaining to the durability of delicate details. 
Within the complicated ecological community of a smart city, the lack of 
ability to customize or get rid of detailed information presents possible dan- 
gers to long-lasting personal privacy for people [38]. Creating durable gain 
access to controls and privacy-preserving devices is critical to browsing this 
difficulty successfully. 


4.4.2 Smart contract vulnerabilities 


Executing smart contracts, a foundational element of Blockchain-enabled 
IoT applications, introduces vulnerabilities that malicious actors may 
exploit. In smart cities, flaws in smart contracts can lead to unauthorized 
access, manipulation of data, or disruptions in automated processes, mir- 
roring the concerns observed in web applications. To fortify smart contracts 
against potential threats, a comprehensive approach involving rigorous code 
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audits and implementing secure coding practices is essential [39]. This pro- 
active strategy is paramount in ensuring the integrity of the automated 
processes crucial for the seamless functioning of smart city infrastructures. 


4.4.3 Scalability and network consensus 


As the variety of IoT tools and purchases rises, scalability concerns and the 
choice of reliable agreement systems come to be the main issues. Ineffective 
agreement formulas and scalability traffic jams can endanger smart city 
facilities’ safety and real-time responsiveness. Dealing with these obstacles 
calls for cutting-edge services that boost the scalability and performance of 
Blockchain networks [40]. By consistently developing the innovation under- 
pinning these networks, we can much better suit the boosting needs of a 
growing wise city environment. 


4.4.4 Network security and consensus vulnerabilities 


The dependence on agreement formulas to verify purchases within 
Blockchain networks presents susceptibilities, specifically in large IoT 
implementations. Harmful stars making use of these susceptibilities can 
endanger the honesty of deals and keeping information. Advanced agree- 
ment systems and routine safety and security audits are crucial to strengthen 
network safety and security to neutralize this threat [41]. Aggressive steps 
are important to ensure the toughness of the agreement devices, guarding 
the wise city facilities against possible assaults on the Blockchain network. 


4.4.5 Supply chain security for loT devices 


Making certain of the safety and security of the whole supply chain for IoT 
tools in smart cities is a complex difficulty. Meddling or endangering IoT 
gadgets at any phase of the supply chain can present susceptibilities right 
into the Blockchain network, weakening the total safety of smart city sys- 
tems. Applying rigorous supply chain safety and security actions and gad- 
get attestation procedures is vital to alleviate this danger thoroughly [41]. 
An aggressive and watchful method throughout all supply chain phases is 
important to preserving the honesty of the linked gadgets within the smart 
city landscape. 


4.4.6 Identity management 


Incorporating IoT with Blockchain requires durable identification moni- 
toring systems, elevating issues regarding possibly revealing directly rec- 
ognizable details. Poor identification defense steps might bring about the 
concession of a person’s privacy. As a result, executing privacy-centric 
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identification services, such as zero-knowledge evidence, is essential to 
reduce these dangers efficiently [42]. By including innovative personal pri- 
vacy innovations, wise cities can strike a fragile equilibrium in between the 
benefits of identification administration and the conservation of specific 
personal privacy. 


4.4.7 Data linkability 


The clear nature of Blockchain presents the threat of information linkabil- 
ity, permitting entities to associate and evaluate transactional information. 
Relentless information linkability endangers resident privacy and personal 
privacy within the smart city landscape [43]. Constant advancements in 
cryptographic methods and privacy-enhancing modern technologies are 
vital to counter this threat. These developments are crucial in combating 
the dangers connected with information linkability, making sure that smart 
cities can harness the advantages of Blockchain while guarding residents’ 
privacy. Blockchain and IoT are modern technologies in the developing 
landscape of smart cities. 


4.5 CHALLENGES, LIMITATIONS, AND IMPLICATIONS 


Within this section, the challenges and limitations of using Blockchain in 
smart cities are presented, as well as some implications. 


4.5.1 Challenges 
4.5.1.1 Scalability 


On the Blockchain front, the problem hinges on effectively managing the 
large number of purchases happening within a smart city’s vibrant and 
interconnected setting. As the city’s facilities end up being progressively dig- 
itized, the stress on the Blockchain network can impede purchase handling 
rate and general system efficiency [44]. All at once, the spreading of IoT 
tools aggravates scalability worries. The expanding variety of these inter- 
connected gadgets, varying from sensing units to wise devices, increases 
the intricacy of handling their communications flawlessly. They collabo- 
rate information exchange and interaction among a substantial selection of 
real-time tools, requiring durable scalability services to avoid traffic jams 
and ensure the smooth operation of smart city systems. 


4.5.1.2 Interoperability 


This is a vital difficulty in incorporating Blockchain and IoT within wise 
cities that focus on the smooth interaction between varied IoT gadgets and 
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Blockchain systems. The detailed internet of interconnected gadgets with 
distinct methods and requirements poses a considerable obstacle. IoT tools 
create substantial quantities of information that should be effectively and 
safely connected to the Blockchain for handling. Accomplishing a standard 
interaction procedure that can fit the diversification of IoT gadgets is impor- 
tant [45]. Without a typical language, the prospective advantages of the con- 
solidated modern technologies might be jeopardized, preventing the smart 
city’s capability to harness the full power of Blockchain and IoT harmonies. 


4.5.1.3 Security concerns 


Safety and security are vital problems when incorporating Blockchain and 
IoI in smart cities. In the world of Blockchain, the innovation naturally 
uses durable protection with its decentralized and cryptographic concepts. 
Nevertheless, susceptibilities can arise in wise agreements, the self-execut- 
ing codes on Blockchain, or within the network’s agreement formulas [46]. 
However, IoT tools present one more layer of safety and security difficul- 
ties. Tool susceptibilities, otherwise effectively resolved, can be made use 
of by harmful stars, endangering the whole system’s stability. In addition, 
information safety and security problems in the interaction between IoT 
gadgets and the Blockchain network might be subject to delicate info. Safe 
and secure interaction procedures and durable verification devices will be 
vital to reducing these threats. 


4.5.1.4 Energy consumption 


The power usage related to Blockchain’s Proof-of-Work (PoW) agreement 
formulas is a considerable problem, especially when considering their pro- 
spective application in smart cities. PoW includes complicated mathemati- 
cal problems miners should resolve to confirm deals and develop brand-new 
blocks. This procedure needs significant computational power, resulting in 
high power usage. In the context of smart cities, where sustainability is an 
essential emphasis, the ecological effect of PoW ends up being a remarkable 
restriction [47]. The energy-intensive nature of PoW can add to raised car- 
bon impacts and threaten the objective of producing effective and green city 
atmospheres. Resolving this obstacle requires discovering different agree- 
ment systems recognized for their reduced power demands. 


4.5.2 Limitations 
4.5.2.1 Cost 


The expense effects of applying and keeping a Blockchain framework pres- 
ent a considerable difficulty for wise cities. The expenditures sustained 
in establishing the required equipment, software program, and network 
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facilities, combined with recurring functional and upkeep expenses, can 
stress the funds of community authorities. Smart city tasks commonly call 
for significant financial investments, and embracing Blockchain innovation 
presents added monetary dedication [48]. Expenses relate to the first release 
and consist of costs for making certain protection, scalability, and confor- 
mity in time. Resolving these economic factors is vital for effectively incor- 
porating Blockchain in smart city efforts, requiring mindful budgeting and 
critical preparation to stabilize technical development with monetary duty. 


4.5.2.2 Regulatory uncertainty 


This poses a substantial challenge to the prevalent execution of Blockchain 
and IoT in smart cities. The lack of clear and standard policies develops 
an ambiance of uncertainty, inhibiting public and economic sectors from 
completely accepting these transformative innovations [49]. The elabo- 
rate nature of Blockchain and IoT applications demands thorough, law- 
ful structures to attend to information personal privacy, protection, and 
interoperability problems. Without distinct policies, stakeholders might 
think twice about buying or embracing these advancements, being afraid of 
lawful issues. As a result, it is important for policymakers to collaboratively 
create and develop clear regulative standards that cultivate advancement 
while dealing with the distinct difficulties offered by the combination of 
Blockchain and IoT in the context of wise cities [50]. 


4.5.2.3 Adoption barriers 


The fostering of Blockchain and IoT, modern technologies in smart cities, 
encounters substantial obstacles, mainly rooted in resistance to alteration 
and the need for a thorough overhaul of existing systems. Cities and met- 
ropolitan frameworks usually operate reputable structures, and presenting 
turbulent modern technologies like Blockchain and IoT needs an essential 
change in the way of thinking and framework [51]. Stakeholders might be 
reluctant to welcome these modifications as a result of problems concern- 
ing the intricacy of the combination, possible disturbances throughout 
the change, and the prices entailed. Getting rid of these fostering obsta- 
cles demands critical preparation, clear interaction of the advantages, and 
steady execution techniques that reduce the regarded dangers connected 
with such transformative innovations [52]. 


4.5.3 Implications 
4.5.3.1 Data integrity 


Information honesty is vital in smart cities, where IoT gadgets constantly 
create substantial quantities of info (Andoni et al., 2019). The combination 
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of Blockchain and modern technology plays a critical function in strength- 
ening information honesty within this vibrant ecological community. 
Blockchain’s integral attribute of immutability makes sure that as soon as 
information is taped on the Blockchain, it cannot be changed or damaged. 
In the context of IoT gadgets, this immutability attribute is specifically 
essential as it safeguards the credibility and dependability of the informa- 
tion gathered [53]. Each deal or item of info from an IoT gadget is safely 
tape-recorded in a block, and the decentralized and dispersed nature of 
the Blockchain makes certain that no solitary entity has control over the 
whole system, decreasing the danger of unapproved modifications. This 
development depends on the information produced by IoT tools. It supplies 
a durable structure for numerous applications, such as wise agreements 
and clear data-sharing systems in smart city facilities [54]. The guarantee 
of information stability with Blockchain assimilation not only boosts the 
integrity of info but also adds to the general protection and reliability of 
smart city systems. 


4.5.3.2 Decentralization 


This plays an essential duty in strengthening the durability of smart city 
systems by carrying out Blockchain innovation. By dispersing the control 
and storage space of information throughout a network of nodes instead of 
depending on a main authority, Blockchain alleviates the threat of a solitary 
factor of failing [55]. In the context of wise cities, this decentralized style 
makes certain that if one node or part stops working, the system in its entirety 
continues to be functional. This boosted toughness is specifically critical in 
city settings where the integrity of information and systems is vital for differ- 
ent solutions, such as transport, power monitoring, and public security [18]. 


4.5.3.3 Efficiency gains 


Making use of Blockchain and IoT innovations in wise cities can substan- 
tially boost functional effectiveness. With its decentralized, safe, and secure 
nature, Blockchain ensures the stability and immutability of information 
accumulated from IoT gadgets. This is especially important in a smart city 
context where substantial quantities of information are produced from var- 
ied resources such as sensing units, cams, and various other linked gadgets 
[56]. One crucial element adding to performance gains is the clear and com- 
puterized implementation of wise agreements. These self-executing agree- 
ments, made possible by Blockchain, can automate different procedures in 
wise cities, varying from power administration and garbage disposal to traf- 
fic control. For example, smart agreements can help with automated settle- 
ments for power use based upon real-time information from IoT-enabled 
meters, getting rid of the requirement for intermediaries and improving the 
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invoicing procedure. Furthermore, the decentralized nature of Blockchain 
eliminates the dependence on a solitary main authority, decreasing the 
danger of system failings or information violations [18]. This decentraliza- 
tion, paired with the protected and clear audit path supplied by Blockchain, 
boosts the total integrity of wise city procedures. 


4.5.3.4 Transparent governance 


Clear administration is a crucial facet of incorporating smart agreements 
on the Blockchain within the context of wise cities. Smart agreements, 
self-executing items of code with predefined policies, present openness and 
automation that can reinvent administration procedures. The immutability 
of Blockchain guarantees that the terms inscribed in these agreements are 
tamper-proof, cultivating trust funds among stakeholders [5,9]. This open- 
ness minimizes the danger of illegal tasks and permits people and author- 
ities alike to trace and confirm every action of a procedure. Automated 
implementation of agreements better enhances administration, minimizing 
the requirement for intermediaries and reducing the possibility of human 
mistakes [59]. As smart cities progress, executing wise agreements comes to 
be a keystone in developing liable and reliable administration frameworks, 
establishing a criterion for a brand-new period of clear and automatic pub- 
lic administration. 


4.6 FUTURE DIRECTIONS 


The future instructions in incorporating Blockchain and IoT in smart cities 
hold tremendous capacity for transformative developments. Research study 
undertakings need to think about discovering hybrid Blockchain services 
that integrate the stamina of public and personal Blockchains, offering a 
nuanced strategy for scalability and personal privacy problems. Checking 
out the harmony between Blockchain, IoT, and side computers can alleviate 
latency problems and boost real-time handling abilities. Resolving safety 
and personal privacy difficulties stays critical, concentrating on establish- 
ing ingenious methods to secure the delicate information produced by IoT 
gadgets. 

Future research studies ought to likewise explore energy-efficient 
agreement devices to minimize the ecological influence of Blockchain in 
smart city applications. Systematizing interoperability in between varied 
IoT gadgets and Blockchain systems is essential for smooth combina- 
tion. Discovering the application of smart contracts for automating city 
administration procedures and boosting openness in supply chain admin- 
istration stands for appealing methods. In addition, scientists ought to 
add to the growth of flexible regulative structures to fit the progressing 
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landscape of Blockchain and IoT innovations in smart cities. Interest in 
user-centric layout, moral factors to consider, and the assimilation of 
expert systems will jointly form a lasting and comprehensive future for 
smart city applications. 


4.7 CONCLUSION 


Blockchain modern technology is leading an electronic change, transform- 
ing how we see and take care of trust funds and information in a linked 
globe. Decentralization, dispersed ledger innovation, immutability, and 
agreement procedures are its keystones, changing deal confirmation and 
information safety and security. Decentralization permits peer-to-peer 
networks to accept and tape purchases without a main authority, testing 
ordered frameworks. This step will certainly make it possible for individu- 
als to rely on each other straight, removing the intermediaries. The dis- 
persed journal preserves an integrated and unalterable deal background, 
advertising openness, and dissuading scams. Blockchain’s unalterable 
journal and cryptographic protection supply unequaled information sta- 
bility and cyber security. Openness and smart agreements boost responsi- 
bility and automation throughout industries, altering industrial purchases. 
Blockchain’s effectiveness can accelerate treatments, minimize functional 
prices, and promote advancement in money, supply chain monitoring, 
health care, and administration. Blockchain is currently changing mar- 
kets, yet scalability, power intake, and lawful structures remain concerns. 
It guarantees the development of trust funds, openness, and effectiveness 
in our vibrant electronic landscape. It is the structure of a decentralized, 
safe, and trust-based future that will certainly change our electronic age 
with long-lasting effects. 
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5.1 INTRODUCTION 


The advancement and widespread use of cyber-physical systems (CPSs) have 
enabled physical equipment to perform five key activities through the uti- 
lization of computers and networks. These functions include computation, 
communication, precise control, remote coordination, and autonomy [1]. 
Implementing a CPS will significantly enhance the competitiveness of key 
industrial sectors, including the mobile healthcare network (MHN). The 
structure of the MHN is seen in Figure 5.1. In the MHN [2], the mobile 
device is tasked with collecting data and uploading it to the healthcare cloud 
server. The patient and doctor possess a substantial amount of data that 
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index and keywords 
Data user A Data user B 


Figure 5.1 Data delegation [14]. 
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needs to be stored and accessed on the healthcare cloud server. The hospital 
is responsible for overseeing the enrollment of mobile devices and oper- 
ating the healthcare cloud server. The healthcare cloud server functions 
as the overseer of E-healthcare data, providing various services including 
data storage, uploading, and downloading. E-healthcare data in this system 
are a valuable resource for illness management, control, scientific research, 
and teaching. This has garnered increasing attention [3]. Data sharing has 
become an essential structural element in this setting, enabling real-time 
data interaction between medical professionals and patients, as well as pro- 
viding real-time monitoring. The implementation of security measures for 
data sharing is crucial in order to prevent unauthorized user access and 
safeguard the data in the MHN. The ABE public key cryptosystem offers 
meticulous control over ciphertext access. The ABE system establishes a 
connection between the ciphertext and the key based on properties, allow- 
ing decryption only when a user’s secret key aligns with the characteris- 
tics of the ciphertext. In the development of Key-Policy Attribute-Based 
Encryption (KP-ABE), a message is encrypted by using characteristics such 
as “profession: nurse, sex: female, and institution: hospital A,” while keys 
are produced based on access regulations such as “profession: nurse A sex: 
female.” Decryption of ciphertext is contingent upon the alignment of char- 
acteristics with the access policy. Ciphertext-policy attribute-based encryp- 
tion (CP-ABE) is a variant of KP-ABE where the ciphertext is linked to an 
access policy and the key is linked to characteristics [4]. 

ABE isa valuable cryptographic technology that permits for the safe trans- 
fer of information to untrustworthy repositories, such as external web serv- 
ers. ABE facilitates the efficient transmission of data provided among many 
stakeholders based on their respective roles or characteristics. Conventional 
encryption methods provide encryption from one end to another, but need 
individuals to supply decryption keys or store data in several encrypted 
copies with distinct keys. Neither choice is suitable. ABE may minimize 
the burden of key management in comparison to conventional encryption 
techniques. ABE offers precise access control and encryption that covers the 
whole communication process. Unscrupulous individuals may get access to 
encrypted data stored in the public repository that is not compatible with 
their confidential key. Nevertheless, in the absence of intelligible encryption 
keys, they are unable to comprehend the substance of the data. In addition, 
KP-ABE provides user privacy safeguards. In KP-ABE [5], a decryption 
user may access the authorized ciphertext even without knowledge of the 
encryptor. The data owner just has to identify the categories of users who 
are allowed to decode the ciphertext. Preserving the privacy and security 
of the MHN has evident advantages. When devices with limited resources 
are involved, there are certain problems that need resolution. In addition, 
devices with constrained resources include those with limited computa- 
tional capabilities, storage capacity, and combined computing and storage 
capabilities, such as iPads and smartphones. 
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5.2 RELATED WORK HEALTHCARE AND 
ATTRIBUTE-BASED ENCRYPTION 


An increasing number of programs are now prioritizing the issue of revo- 
cation. Created a reversible encryption system that enables indirect with- 
drawal, where each feature in the system has a specified time period of 
validity. The expert regularly informs the attributes and reallocates the 
user’s key information. 


5.2.1 Optimized variable attribute-based encryption 


Y. Dong et al. [6] the study presents a prototype enhances the robustness of 
the blockchain scheme by protecting it from assaults on central points. The 
system employs an MA-ABE to obviate the need of a centralized authority, 
hence guaranteeing the security and confidentiality of the data stored on 
the blockchain. The authors assess the system’s performance and showcase 
the practicality of constructing a redactable blockchain with access con- 
trol. The viability of deploying a redactable blockchain with access control 
is shown, offering a pragmatic resolution for organizations seeking safe 
data exchange while retaining content control. The system’s performance 
is assessed, demonstrating the practicality of implementing a redactable 
blockchain with access control. 

The technology employs attribute-based encryption methods and a cha- 
meleon hash function to bolster security and privacy inside the blockchain 
system. The research results have significant ramifications for companies 
and sectors that handle confidential data, such as healthcare, banking, and 
government, where safeguarding privacy and ensuring data security are of 
utmost importance. 

Roop Ranjan et al. [7], an innovative collaboration approach that text 
explores the implementation of blockchain as a means of exchanging mes- 
sages and demonstrates that the access polynomial is an effective method 
for quickly revoking access in cloud storage. The research suggests using 
an access polynomial for key distribution, which minimizes the quantity 
of messages shared among group members. The suggested method offers 
a robust and distributed solution for sharing files in the cloud, guarantee- 
ing the confidentiality of data and the anonymity of users. It allows for 
quick removal of user access without the need to communicate with users 
who have not had their access revoked, hence enhancing the efficiency of 
access control in cloud storage. The concept uses blockchain technology to 
enforce access control using a smart contract, hence removing the need fora 
trusted entity and mitigating denial of service threats. ABE allows the data 
owner to enforce access controls on files while maintaining user anonymity 
and improving privacy protection. The assessment findings indicate that 
the suggested system has a high level of scalability, making it appropriate 
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for large-scale deployments involving up to 20,000 people. The incorpo- 
ration of smart contracts in the suggested scheme eradicates any singular 
vulnerabilities and diminishes the expenses associated with system upkeep 
as compared to traditional alternatives. 

L. Wu and J. Du’s [8] research introduces an innovative access con- 
trol strategy for implanted medical devices (IMDs) that relies on a proxy 
device, such as a smartphone, to do complex cryptographic calculations. 
This approach effectively extends the lifespan of the IMD. The proposed 
system uses to provide precise access control, guaranteeing that only 
qualified and authorized users may access the IMDs. The system employs 
CP-ABE to establish precise access control on the qualifications of the pro- 
gramming operator. The execution of the strategy on tangible emulation 
devices illustrates its viability and efficacy, offering a pragmatic resolu- 
tion for safeguarding IMD. The technique is executed on actual emulator 
devices, and empirical findings show that it is efficient and impactful. 
The study showcases a functional model of the suggested approach by 
using genuine emulation devices, hence illustrating its practicality and 
efficiency. 

M. Mahdavi et al.’s [9] research introduces KP-ABE techniques that use 
a reduced number of pairing processes in comparison to earlier KP-ABE 
schemes. Additionally, elliptic curve groups are utilized to guarantee shorter 
keys. Likewise, the study suggests safe techniques for delegating compu- 
tationally intensive tasks, such as scalar multiplication by a curve point, 
exponentiation, and pairing, in fuzzy identity-based encryption (FIBE) and 
KP-ABE schemes. These solutions aim to enhance the ability of Internet of 
Things (IoT) devices to handle complicated operations. Furthermore, the 
article lacks a thorough assessment or examination of the suggested solutions 
in relation to their performance, efficiency, or feasibility in real-world IoT 
situations. Further study and testing may be required to determine the effi- 
cacy and practicality of the suggested methodologies in real IoT installations. 

N. Arivazhagan et al. [10] introduce a new approach called CT-MA-ABE 
(Cross-Trust Multiple Authorization Attribute-Based Encryption) to tackle 
these problems. This method incorporates the function of a “notary” in 
cross-border exchanges, resolving the issue of oversight in completely decen- 
tralized alternatives while also taking into account the matter of confidence 
in centralized systems. The system’s implementation in the legal jurisdic- 
tions of South China, including Zhuhai and Macau, serves as a crucial 
infrastructure component for ensuring the security of data exchanges. This 
successful deployment further highlights the system’s efficacy as a depend- 
able and secure solution. 

F. Meng [11], an attempt to address this difficulty, academics have 
put up several Object-Oriented Attribute-Based Key Sharing (OOABKS) 
approaches. These approaches enhance the efficiency of creating cipher- 
texts online by pre-computing intermediate ciphertexts offline. 
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Nevertheless, we have identified several constraints in the current 
OOABKS schemes: (1) Certain schemes exhibit limited flexibility due to the 
inclusion of an access structure inside the intermediate ciphertext, render- 
ing it incapable of generating a final ciphertext encrypted under a different 
access structure. (2) Alternative schemes are unable to meet the fundamen- 
tal security prerequisite. 

This study involves a thorough examination of current OOABKS systems. 
Here, we provide an improved OOABKS technique that provides verifiable 
security. Our technique preserves the benefits of online/offline ciphertext 
creation seen in prior OOAKBS systems while resolving their constraints. 
The experimental findings demonstrate that our modified system attains 
similar levels of efficiency as prior OOABKS schemes, while maintaining 
both flexibility and security. 

PR. Kumar et al. [12] the current procedure for revoking most Revocable 
Attribute-Based Encryption (RABE) schemes is carried out by the cloud 
storage provider (CSP). As the CSP acts as an impartial and inquisitive inter- 
mediary, there is no assurance that the plaintext data associated with the 
revised ciphertext after revocation will remain unchanged. Furthermore, 
the majority of attribute-based encryption systems encounter problems 
associated with key escrow. In order to address the above-described prob- 
lems, we propose a highly efficient RABE technique that not only ensures 
data integrity but also resolves the issue of key escrow. 

PR. Kumar et al. [13] to be able to be effective, an Adaptive Secure 
Encryption (ASE) scheme must be capable of accommodating comprehen- 
sive search queries, which may be formulated as conjunctions, disjunctions, 
or any Boolean formulae. This study introduces a novel and efficient ASE 
algorithm, namely Feasible Approximate Solution Evaluation (FEASE), 
which offers both speed and expressiveness. The search process for every 
collection of keywords, regardless of its size, only takes three pairing oper- 
ations. Additionally, the encryption and trapdoor methods have a linear 
complexity in relation to the amount of keywords. FEASE is built around a 
novel and efficient, which we provide as our first proposition. This scheme 
is noteworthy in its own right. In order to enhance security against key- 
word-guessing attacks, we expand the existing FEASE system to create the 
first and FEASE demonstrates superior performance compared to all cur- 
rently available expressive ASE constructs. 

J. Lee et al. [14] Personal Health Records (PHRs) store private informa- 
tion that can lead to privacy concerns. Moreover, in medical emergencies, it is 
important to consider multiple authorities to handle delegation. While differ- 
ent approaches are being researched for sharing data, they often fail to meet 
the required security standards in a real PHR sharing environment. This study 
introduces a system that utilizes key aggregate searchable encryption (KASE) 
to meet security requirements and utilizes blockchain and smart contracts 
to enhance data integrity, maintain data audit records, and ensure transpar- 
ency. Researchers present a mechanism that guarantees the rights of individu- 
als who possess PHR data when they delegate various rights using attribute 
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tokens. We do both formal and informal security evaluations to assess the 
resilience of the proposed system against possible adversarial assaults. 

PR. Kumar et al. [15] proxy re-encryption (PRE) enables a partially 
trusted intermediary, equipped with a re-encryption key, to convert a 
ciphertext encrypted using one key into an encryption of the same message 
using a different key. Attribute-based proxy re-encryption (AB-PRE) is an 
extension of PRE that allows for precise control over access to encrypted 
data and the ability to delegate access rights. Nevertheless, conventional 
AB-PRE is plagued by a single point of failure due to its dependence on a 
solitary proxy for executing ciphertext changes. In order to address this 
issue, this work presents a novel concept known as attribute-based thresh- 
old proxy re-encryption (AB-TPRE), which employs a number of proxies 
(N) for the purpose of transformations. In AB-TPRE, the re-encryption 
key is divided into N shares, with each proxy getting one share to pro- 
duce a transformed ciphertext share. The converted ciphertext can only be 
successfully constructed when a certain number of transformed ciphertext 
shares, known as the threshold (t), are combined. In addition, we propose 
the implementation of a share updatable feature to mitigate the security risk 
associated with the potential leaking of shares. This property enables the 
refreshing of re-encryption key shares. 

There is a notable absence of a comprehensive Systematic Literature 
Survey (SLR) in the current academic literature and research on the signifi- 
cance of ABE in health services and electronic health records (EHRs). Such 
a survey would provide valuable insights into the past, present, and ongoing 
advancements in ABE advances. It is well acknowledged among researchers 
that, before to starting any research project, an investigator consistently 
does a comprehensive literature review to choose the specific area of focus 
for future investigation. Therefore, it is crucial to conduct a comprehensive 
survey that encompasses all enhancements made to the ABE scheme up to 
the present day. 

This literature review consolidates and outlines all ABE-based method- 
ologies that have been specifically applied to health services since 2012. 
It includes updated versions and implementations across various specific 
categories, with the aim of employing explicit techniques to determine reli- 
able conclusions based on these investigations. The main objective of this 
thorough and extensive study is to not only summarize existing research 
and literature, but also provide a framework for future research endeavors 
using ABE in the field of health services and other domains. 


5.3 METHODOLOGY 


The primary goal of an SLR is to comprehensively examine and include 
all relevant literature and studies related to a certain set of research topics 
and areas of interest. The research process often begins with a curiosity in 
a particular topic, but familiarity with the subject aids in formulating an 
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appropriate research question for an inquiry. The investigators have utilized 
a two-stage screening method in order to extract the findings and papers 
from the databases. In the first stage, their bodies sort specific research 
based on Boolean-merged search strings, inclusion/exclusion criteria, and 
research concerns. In the second stage, they proceeded to extract the power 
source of the results and papers from the databases. 

Within the context of this discussion, the word “Domain” refers to the 
region or domain in which a particular ABE system was presented and 
based on. When conducting this study, a total of seven different categories 
of domains are taken into consideration. These categories are as follows: 
“CP-ABE,” “KP-ABE,” “Hybrid,” “Multiauthority-ABE,” “Searchable 
Encryption-ABE,” “Blockchain/Decentralized,” and “Hierarchical ABE.” 
As its name suggests, the “Hybrid” domain is a combination of two sepa- 
rate and different approaches that were not able to be accommodated inside 
the other domains (see Figure 5.1). 

High-end granularity may be accomplished by the use of this method, 
which is known as Multiauthority-ABE. This approach allows for dispersed 
access rights to be granted to the data owner. Within the framework of the 
multi-authority structure, this part takes a look at a variety of different 
pieces of literature. Each study is emphasized and examined in accordance 
with the year in which it was published, beginning in 2013 and continuing 
until 2021. This is done while taking into consideration the key components 
of the various studies, such as the objectives, procedures, and enhancements 
that are proposed by the research. 

Blockchain is a decentralized technology in Figure 5.2 that does not need 
a central authority to function. It was built on the foundation of distributed 
ledgers and consensus frameworks. Asa result, a number of such alternatives 
that eliminate the need for attribute authority and analogous techniques are 
highlighted and discussed in the following paragraphs, beginning in the 
year 2019 and continuing through the year 2021. 


5.3.1 A novel Key-Policy Attribute-Based 
Encryption (KP-ABE) 


Here, researchers provide a novel KP-ABE [16] method that utilizes the 
concept of the Delegable identity-based broadcast encryption strategy to 
achieve ciphertexts of fixed size in Figure 5.3. The construction of KP-ABE 
is detailed as follows: 


Setup be U) 


The trusted attribute authority selects three cyclic groups, G,, G,, and G», 
of prime order p, with a bilinear pairing e: G;xG, > G~ given the security 
parameter 4. The trusted attribute authority selects two generators g from 
G, and h from G,, together with a secret value a from the set of non-zero 
integers modulo p, and a cryptographic hash function H that maps binary 
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Figure 5.3 Data encryption and decryption. 
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strings to non-zero integers modulo p. The security study will consider H 
as a stochastic oracle. The master secret key is denoted as msk=(g, a). The 
public parameters consist of the values =(w,v,h,b*,...,b% |, where w is 
equal to g raised to the power of a, and V is equal to the exponentiation of 
e with the base g and the exponent h. 


v =é(g,h) (5.1) 


The KeyGen method generates a private key for an access structure that 
is linked to the Linear Secret Sharing Scheme (LSSS) scheme (M,x«,P) by 
using the provided parameters, msk, and (M, p), Initially, it produces shares 
of ¢ using the Linear Secret Sharing Scheme (M,,x, P). Specifically, it selects 
a column vector B = (Bi, Brs---sBe)” T such that 6,=s=1 and fh,...,B € RZ». 
For any value of i ranging from 1 to i= £, the program computes 4; by tak- 
ing the inner product of M; and 2z. It then assigns SK(y,p) according to the 
following rules: 


Kis ={p,(K.,)"_,} _ jen (ete ) i i (5.2) 


i=1 i=l jy 


Encrypt(params, m, and W): Define t as the cardinality of the set of attri- 
butes W, and represent W. 


W={o;} (5.3) 


t 
i=1° 
The sender selects an element s from the set of non-zero integers RZ*. Using 
this value, the sender calculates the ciphertext c=(co, c4, c2) as follows: co is 
obtained by multiplying m with the value V, c, is obtained by subtracting s 
from w, and c, is obtained by multiplying s with h and then multiplying the 
result with the product of t indices, where each index is obtained by adding 
a to the hash value of œ, for each i from 1 to t. 


C=m-v =m-ég,h)*, 
Q=w =g, (5.4) 


s[I(e+Hlo) 
C2 = þh i š 


Decryption of the ciphertext c, labeled with the set of characteristics 
W= [oi ¥ and using the secret key SK(M, p), involves parsing c as c=(co, 
c,, and c,). The receiver initially defines the set I 


{ip(i)e W} (5.5) 
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and proceeds to compute the reconstruction constants 


{ui} a = Recon((M, p), W) (5.6) 


function. The decryption key for the LSSS scheme (M, p) is represented as 


, 
SKim,p) = (DolKu) ab (5.7) 


Next, the receiver does calculations. 


porter | TI oslo Tr) s.s) 


piw (œ) is a polynomial of degree t-2 on the variable a, which is evident. 


The decrypting party can compute bh” based on (Kala for j ranging 
from 1 to n. Next, the decrypting party performs calculations. 


Y = (efa bO) a(Dra)] AC = ate hy (5.9) 


Finally, the decrypting party computes. 


Y=] [Y" = &g, h, m=2 5.10 
I] F =gh, m= © (5.10) 
The suggested KP-ABE scheme is accurate. 
Evidence. Let us assume that c is well-formed, indicating that c is 
encrypted using a certain set of properties. 
W ={a;}"_ then 


1 


Table 5.1 presents a comparison of the efficiency of several ABE schemes 
that are currently available, specifically focusing on methods that use 


Table 5.1 Previous studies 


Contribution 


Limitation 


Authors Methods 
Y. Dong Decentralized 
et al. [6] consortium 
blockchain 
system 
A. Shafieinejad Blockchain and 
and Roop ABE for safe 
Ranjan cloud file 
et al. [7] sharing 
L.Wu and Access control 
J. Du [8] for implanted 
medical devices 
M. Mahdavi Revised FIBE 
et al. [9] and KP-ABE 
techniques 


N.Arivazhagan Cross-trust 
et al. [10] multiple 
authorization 
ABE 

Improved 
OOABKS with 
verifiable 
security 


F. Meng [I 1] 


Prototype with redactability and access control using chameleon 
hash, digital signature, and multi-authority attribute-based 
encryption (MA-ABE). Enhances robustness and protects against 
central point attacks. Viability demonstrated in practical scenarios. 

Innovative collaboration approach ensuring safe cloud file sharing. 
Uses access polynomial for quick access revocation. Incorporates 
smart contracts in blockchain for access control. High scalability. 


Innovative access control strategy for implanted medical devices 
using proxy devices and ciphertext-policy attribute-based 
encryption (CP-ABE). Efficient execution on tangible emulator 
devices. Practical and efficient model demonstrated. 

Presents revised versions of fuzzy identity-based encryption (FIBE) 
and Key-Policy Attribute-Based Encryption (KP-ABE) schemes with 
reduced pairing processes. Lacks thorough assessment of 
performance in real-world loT situations. 

Introduces CT-MA-ABE for cross-border institutional authorization 
using MA-ABE and blockchain certification authority (BCA). 
Implements encryption-based authorization for privacy. Successful 
deployment in South China legal jurisdictions. 

Thorough examination of current Object-Oriented Attribute-Based 
Key Sharing (OOABKS) systems. Presents an improved OOABKS 
technique with verifiable security and efficiency. Maintains flexibility 
and security. 


Limited assessment details on 
performance, efficiency, and real-world 
feasibility 


Lack of detailed assessment on 
performance, efficiency, and real-world 
feasibility 


Limited information on scalability and 
real-world deployment 


Lack of assessment details on 
performance, efficiency, and feasibility 
in real-world loT situations 


Successful deployment in specific 
regions, lack of broader assessment 
details 


Lack of information on specific 
constraints in current OOABKS 


schemes 


(Continued) 
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Table 5.1 (Continued) Previous studies 


Authors Methods Contribution Limitation 
PR. Kumar Revocable Proposes an efficient RABE technique addressing data integrity and Lacks specific information on 
et al. [12] Attribute-Based key escrow issues. Security demonstrated using decisional q-parallel performance and scalability 
Encryption bilinear Diffie-Hellman exponent (q-PBDHE) assumption and 
(RABE) discrete logarithm (DL) premise. 
PR. Kumar Adaptive Secure Introduces FEASE, an Adaptive Secure Encryption (ASE) algorithm, Lacks detailed information on specific 
et al. [13] Encryption providing speed and expressiveness. Enhances security against limitations and potential vulnerabilities 
(ASE) keyword-guessing attacks. Superior performance compared to 
existing ASE constructs. 
J. Lee et al. Key Aggregate Introduces a system for Personal Health Records (PHRs) that uses Lacks detailed information on specific 
[14] Searchable key aggregate searchable encryption (KASE) for security, and security evaluations, and potential 
Encryption blockchain with smart contracts for data integrity, audit records, challenges in real-world 
(KASE) and transparency. Ensures rights of individuals during delegation implementation 
using attribute tokens. Formal and informal security evaluations 
conducted. 
PR. Kumar Attribute-Based Presents attribute-based threshold proxy re-encryption (AB-TPRE) The effectiveness and practicality of the 
et al. [15] Threshold Proxy as a solution to the single point of failure in conventional proposed share updatable feature 


Re-Encryption Attribute-Based Proxy Re-Encryption (AB-PRE).AB-TPRE employs 

(AB-TPRE) multiple proxies with a threshold mechanism for secure 
transformations. Introduces a share updatable feature to mitigate 
security risks. 


need to be thoroughly tested. 
Additional assessment details, 
especially in real-world scenarios, are 
required. 
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ciphertexts of a fixed size. Introduced a strategy for CP-ABE and KP-ABE 
that utilizes ciphertexts of constant size. We refer to these as schemes 1 and 2, 
respectively. 


5.3.2 Security of BLS and Boneh-Gentry-Lynn-Shacham 
(BGLS) signature 


An aggregate signature system is a cryptographic method that allows for 
the combination of several signatures on multiple messages from multiple 
users into a single concise signature. 

Regrettably, there are currently no aggregate signature methods that are 
completely non-interactive, even when relying on the random oracle heuris- 
tic. This means that signers are need to exchange messages with each other, 
either in a sequential or alternative manner, in order to create the signature. 
Interacting with some intriguing apps might be too expensive. 

Aggregate signatures, as defined by S. Ilakiya et al. [17], are a kind of 
digital signature that allows any party to consolidate many signatures on 
multiple messages from multiple users into a single concise signature. This 
basic element is valuable in several scenarios when there is limited storage 
or bandwidth capacity, and thus, there is a need to minimize the overall 
cryptographic burden. 

Basic BGLS aggregate signature scheme [17]: 


Set up G = (gi), G2 = (g2), Gr, prime order p 
H:{0,l}° =G, full domain 


e: G X G > Gz, bilinear 


KeyGen x s Zp 
sk=xE Zp 
pk = (yiry2) =(gi' 83) € GX Gy 
Sign (sk, m) ga H(m)* EG 
Agg (On... 0k) o=o eG 
AggVer (o4,(pki,m),...+(pkisM)) TIK e(H(m;), y2;)=e(oa, g2) 


5.4 ENCRYPTION AND HEALTHCARE 


The first factor that guarantees the unity and strength of a healthcare sys- 
tem is the protection of personal health information (PHI). The introduc- 
tion of digital technology in healthcare allows for convenient access to this 
data, which facilitates the provision of more efficient and impactful health 
services and therapy. Presently, healthcare professionals widely use EHR 
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technology. Most medical offices have shifted their EHR systems [18] to use 
outsourcing options, wherein EHR data is kept in centralized data ware- 
houses on a third-party cloud that may lack total reliability and security. 

Healthcare organizations must possess the capability to monitor and 
record user entry into their network. In order to do this, it is important 
to provide each user a distinct user ID for the purpose of accessing your 
organization’s network. Ensuring the security of healthcare networks is 
a very successful strategy for preventing breaches. Malicious actors that 
successfully breach an organization’s internal network have the capabil- 
ity to disseminate malware over the whole of the organization’s system, 
therefore infecting every device that establishes a connection to the net- 
work. Failure to adopt efficient network management may have a negative 
impact on a firm. 

Data backup and disaster recovery refer to the systematic practice of cre- 
ating duplicate copies of crucial information and establishing a well-defined 
strategy for retrieving and restoring these files in the event of a catastrophe. 
HIPAA mandates the replication and storage of ePHI at an external facility. 
Healthcare organizations must also establish a disaster recovery strategy 
and provide a means of accessing data during emergency scenarios. In order 
to adhere to HIPAA regulations, it is essential to regularly create backups of 
data to mitigate the risk of patient data loss. 


5.5 RESULTS AND DISCUSSION 


The CP-ABE encryption technique, which is seen in Figure 5.4, is required to 
choose one random element from Z, (during the building of polynomial p) 
and run two exponentiations in G (an elliptic curve group) for each attri- 
bute in order to encrypt the data. The KP-ABE encryption technique [19], 
however, only has to do two exponentiations in G (an elliptic curve group) 
for each attribute, and it only needs to choose one random element from 
Z, across all of the attributes. Consequently, the time required for encryp- 
tion by KP-ABE [20] is about three times quicker than the time required by 
CP-ABE [21]. 

In terms of decryption, the costliest calculation is the bilinear mapping, 
which is performed twice in CP-ABE and just once in KP-ABE for each 
leaf node. Consequently, the decryption time of KP-ABE is about two 
times quicker than the decryption time of CP-ABE [22]. However, while 
being more computationally efficient than CP-ABE in all three dimensions, 
KP-ABE has two notable drawbacks. Initially, it is important to note that 
the dimensions of the public key and master key increase proportionally 
with the quantity of characteristics in the overall set. This might result in an 
increase in communication overhead for the system, particularly when the 
system often necessitates users to update their public key. The relationship 
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Figure 5.4 Enhanced Key-Policy Attribute-Based Encryption public and master key size. 


between the sizes of a public key, a master key show in Figures 5.5 and 5.6, 
and qualities in the universe. Our method observes that a numerical attri- 
bute is represented by a maximum of 64 unique characteristics. The cur- 
rent KP-ABE [23] characteristics universe of a PHR system may be rather 
extensive. However, we may enhance the limitation on key size by using a 
big universe architecture of KP-ABE. 

Comparisons are conducted based on the dimensions of the private key, 
the size of the ciphertext, and the quantity of pairing assessments during 
encryption and decryption given in Table 5.2. 

An intriguing unresolved issue is the development of a KP-ABE (see 
Figures 5.4 and 5.5) [24] method that produces ciphertexts of a fixed size 
while maintaining security under a widely accepted premise or attaining a 
more robust definition of complete security. A further complex issue is the 
construction of a KP-ABE scheme that maintains a consistent size for both 
the ciphertext and the private key. We provide a synchronized aggregate 
signature construction in the random oracle model that is more efficient 
than our conventional model construction. Additionally, it has features that 
may make it more appealing for certain applications compared to current 
random oracle schemes. Synchronized aggregate signatures [17,25] may 
decrease the bandwidth demands that message signing places on a network. 
Intermediate routing nodes have the ability to aggregate signatures at any 
location when numerous signatures need to be sent to the collector, instead 
of carrying all signature data. 
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Figure 5.6 Data query number of classes. 


The data retrieval time when numerous data consumers are involved is 
particularly intriguing. Figure 5.6 demonstrates that the time it takes to 
query data in a situation with several clients is directly proportional to the 
number of characteristics. However, this is not the case in a single client 
situation. The number of customers has a direct impact on the total data 
query time. Envisioning a nationwide deployment of a PHR system, the 
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Table 5.2 Comparisons among ABE schemes 


[14] [15] [20] [19] Enhanced KP-ABE 
ABE Type CP-ABE CP-ABE CP-ABE KP-ABE KP-ABE 
Access AND Threshold Threshold Monotone Monotone 
Private key 2|G wIG|+n|G] (n+w)|G|+|Z, (n+l)t|G| — t|G|+nt|G,| 
Cipher text 2|G|/+|G;|_ G|+|G.|+|Gr|  2|G|+ |G] 2\G|+|G;|  G|+|G,|+|G;| 
Encry cost 0 0 Ip Ip Ip 
Decrp cost 2p 3p 3p 2p 2p 


potential number of clients concurrently using this system may reach hun- 
dreds of thousands. Such circumstances might result in a significant delay 
in query response time for the system. 


5.6 CONCLUSION 


This research presents a novel Enhanced KP-ABE scheme that can handle 
any monotonic access structure. The approach has the advantage of hav- 
ing ciphertexts of a fixed size. Additionally, we have demonstrated that the 
proposed scheme achieves semantic security in the selective-set model, rely- 
ing on the generic Diffie-Hellman exponent assumption. An inherent draw- 
back of the proposed Enhanced KP-ABE system is that the size of private 
keys increases significantly with the number of characteristics in the access 
structure. An interesting unresolved issue would include the development 
of an Enhanced KP-ABE system that produces ciphertexts of a fixed size, 
while maintaining security under a widely accepted assumption or attain- 
ing a more robust level of security. A further complex issue is the creation of 
an Enhanced KP-ABE system that maintains a consistent size for both the 
ciphertext and private key. 


5.7 FUTURE WORK 


Strengthen the ABE library by addressing the number of constraints and 
deficiencies in its collection including limited support for complex access 
structures inefficiencies in key generation and encryption processes inad- 
equate handling of large attribute sets and a lack of optimization for scal- 
ability and performance in real-world applications. Our ABE library suffers 
from a number of constraints and deficiencies in its collection. The security 
advantage is considerably diminished by the amount of time that is required 
to use the ABE library. Because of this, we are going to make an effort to 
include new ABE architecture into our ABE library in order to improve 
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its overall performance. Investigate ways to demonstrate access hierarchies 
that are both more animated and more effective. At the moment, we define 
access structures in a manner that is static, which is not appropriate for a 
dynamic system such as PHR. Additionally, a more effective way of descrip- 
tion has the potential to minimize the complexity of both key management 
and policy. The combination of cryptographic techniques with additional 
methods that enhance privacy is recommended. A crucial way for protect- 
ing private data from cloud servers that are only partly trustworthy is the 
use of cryptographic algorithms; however, this is not the only option avail- 
able. In light of this, we are working to discover a more effective method of 
addressing the problem of privacy and security in PHR systems. 
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Chapter 6 


Machine learning unleashed 
A paradigm shift in 
blockchain intelligence 


Marepalli Radha, MD Asma, Ashlesha Kolarkar, 
and Kuncham Sreenivasa Rao 


6.1 INTRODUCTION 


Innovative applications, including connected health care, smart cities, and 
connected industries, have been developed as a result of the rapid develop- 
ment of emerging technologies, smartphones, sensors, and 5G communi- 
cation. This has contributed to the production of enormous quantities of 
data. There are legitimate security issues regarding the estimated 50.1% 
increase in the number of Internet of Things (IoT) devices linked to the 
Internet by 2020, according to a study conducted by the National Cable 
& Telecommunications Association (NCTA) [1]. Cyberattacks and data 
breaches, particularly targeting IoT devices, have surged, with McAfee 
reporting a barrage of incidents across various industries since January 
2018. The vulnerability of IoT devices is exacerbated by their increasing 
interconnectivity, as highlighted by VDC Research Group Inc., which 
identifies security requirements as a significant obstacle in developing 
connected devices [2]. Kaspersky Lab’s data reveals a substantial rise in 
malware samples for IoT devices from 2016 to 2018, underscoring the sub- 
stantial vulnerabilities in these devices. Monitoring network-based risks 
presents substantial issues for a variety of industries, including the gov- 
ernment, energy, healthcare, banking, and research centres, among others. 
The enormous amount, speed, variety, and authenticity of data makes cur- 
rent methods and technologies insufficient to identify novel cyberattacks 
on IoT devices [3]. If you are dealing with massive amounts of data, weekly 
or monthly security analytics reports won’t cut it. The research acknowl- 
edges the shortcomings and suggests combining deep learning with big data 
to strengthen the security of IoT devices. Deep learning is well-suited for 
networks with limited resources because of its compression characteristics, 
ability to train without supervision, and lack of human intervention in fea- 
ture building. This study intends to thoroughly investigate the practicability 
of integrating these technologies, despite the fact that deep learning, big 
data, and IoT security have all received independent attention in the cur- 
rent literature. By evaluating data flow to discover intrusions and attack 
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patterns, the deep extreme learning machine (DELM) tackles the standard 
signature-based technique in Intrusion Detection System (IDS) [4]. 

The integration of smart blockchain-based applications and robust algo- 
rithms for data processing is essential to manage the substantial data gen- 
erated by IoT devices. Machine learning, as part of artificial intelligence 
(AI) frameworks, involves training machines to make predictions and deci- 
sions based on statistical analysis [5]. This research advocates for the use 
of DELM to enhance the security of IoT-enabled smart homes, providing a 
thorough review of cutting-edge technologies and proposing an architecture 
for implementation in blockchain-based smart homes. The DELM frame- 
work, in conjunction with blockchain, offers a unique solution for various 
applications, including fraud detection and theft prediction, by focusing on 
specific chain segments and eliminating data-related issues. 

The organization of this article unfolds as follows: The subsequent sec- 
tion provides a concise summary of relevant survey articles. We then delve 
into the foundational framework of blockchain, elucidate the application of 
the DELM approach within the context of blockchain-based smart homes, 
and delineate the framework for smart home applications. Subsequently, 
we examine the simulation and testing procedures employed for the DELM 
approach. The final section of the article delves into the research conclusions. 


6.2 LITERATURE SURVEY 


This segment offers an in-depth depiction of deep learning, big data tech- 
nologies, and IoT security. Moreover, it explores the interconnections 
between these three realms, aiming to furnish essential insights and a map- 
ping of the relationships among these cutting-edge subjects. 


6.2.1 Deep learning 


Deep Learning, a subset of machine learning, encompasses three distinct 
learning techniques: supervised, semi-supervised, and unsupervised learn- 
ing. It is characterized by the presence of multiple layers in artificial neu- 
ral networks (ANNs), each comprising neurons with activation functions 
capable of generating non-linear outputs. This approach draws inspiration 
from the structure of neurons in the human brain [6]. Over recent years, 
deep learning has garnered significant attention from researchers and orga- 
nizations, surpassing the interest in traditional machine learning methods. 

Deep learning was compared to four other machine learning methods, 
namely Support Vector Machine (SVM), Decision Trees, K-means, and 
Logistic Regression, by the authors of Ref. [7]. They used Google Trends 
to carry out their investigation. The results point to deep learning’s rising 
profile. Natural language processing, search engines, information retrieval, 
picture identification, and image retrieval are just a few of the AI disciplines 
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Figure 6.1 Machine learning vs. deep learning. 


that have made use of this technology. There are four steps to building a 
model in both deep learning and machine learning. Figure 6.1 shows how 
deep learning differs from machine learning. In this section, we will go 
over the fundamentals of deep learning before diving into its common tech- 
niques and traits. 


6.2.2 Big data technologies 


Big data describes the information that requires new ways of processing to 
gain insights and make decisions due to its large volume, high velocity, and 
diversity [8]. Figure 6.3 depicts the six defining features, or “the 6 Vs,” that 
are usually associated with big data. The first three V’s—volume, veloc- 
ity, and variety—are requirements for data to be considered big data, but. 
When it comes to processing large amounts of data efficiently, the tools 
and technologies used are known as big data technology. Apache Hadoop 
[9], Apache Spark [10], Apache Storm [11], Apache Flink [12], Apache 
Cassandra [13], and Apache HBase [14] are all examples of such technolo- 
gies. Earlier, we listed some of the most popular big data technologies and 
described the characteristics of big data, which include the 6V’s. 

With the help of the IoT, smart home sensors and other gadgets may talk 
to one other and share data across different systems. Smart cities, homes, 
offices, retail outlets, agriculture, water management, transportation, health- 
care, and energy are just a few examples of the intelligent systems that have 
recently benefited from the extensive use of the IoT [15]. Data capture in 
the IoT world is facilitated by mobile devices, transportation facilities, public 
areas, and domestic appliances, all of which are heavily reliant on the IoT. 
In addition, the IoT network allows for the remote control of devices across 
different applications, which in turn allows them to communicate with one 
other and with central controlling devices. The IoT enables the gathering of 
geographical, astronomical, environmental, and logistical data, among oth- 
ers, across a wide range of disciplines [16]. Protecting the entire IoT deploy- 
ment architecture from possible assaults is what IoT security is all about [17]. 
There are a number of aspects to think about while developing security solu- 
tions for the IoT. There are several factors to think about while developing 
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Figure 6.2 Six Vs of big data. 


strong IoT security solutions. We can detect a spectrum of security breaches 
linked with these criteria by leveraging the capabilities of deep learning and 
big data technology. Figure 6.2 represents 6Vs of the big data. 

Confidentiality plays a crucial role in ensuring secure information trans- 
mission in all communications. When information is transmitted without 
proper authentication or encryption, it exposes the possibility of privacy 
violations by adversaries [18]. In the context of big data technologies, secure 
data transmission is typically achieved through encryption methodologies, 
thereby preventing unauthorized access and compromise of data by adversar- 
ies [19]. Integrity is vital for preserving the trustworthiness of an IoI system, 
as adversaries may attempt to compromise it. The data received has not been 
altered during transmission if integrity is maintained [20]. It is worth men- 
tioning that Apache Spark, a big data platform, allows users to do integrity 
checks on the IoT system by supporting data quality checks within the Spark 
DataFrame [21]. When discussing IoT systems, “availability” means keeping 
the system accessible to authorized users while blocking access to unauthor- 
ized ones [22]. This objective is in line with big data technologies, which 
prioritize their accessibility to users and their capacity to run on numerous 
nodes, guaranteeing that applications will be highly available [23]. 


6.3 METHODOLOGY 


Simple blockchain cryptocurrencies like bitcoin were introduced by 
Satoshi Nakamoto in 2008 as part of a peer-to-peer payment network that 
aimed to eliminate middlemen and solve the problem of double spending. 


96 Big Data and Blockchain Technology for Secure loT Applications 


The system functions as a clustered data structure, with Secure Hash 
Algorithm (SHA)-256 (Secure Hash Algorithm) and the previous hash 
block authenticating each data block. The basic components of a block 
consist of the following: the block number, the hash of the previous block, 
details about transactions, a nonce, and timestamps. While nonces are 
considered random variables, timestamps are considered continuous vari- 
ables. Nodes in the network that validate and mine data are continually 
solving cryptographic puzzles by hashing together static data (blocks) and 
dynamic data (timestamps and nonces) to produce a number with many 
consecutive leading zeroes [24]. The miners who successfully place the 
block into the blockchain by determining the right hash value are crowned 
champions. To ensure that a block is legitimate, the proof of work mecha- 
nism is employed. 

As miners in a blockchain system, every node in a smart home that is 
linked to an IoT device communicates with a memory pool. All transactions 
that are waiting to be included in the blockchain to create a new block are 
stored in this memory pool. Using a Merkle tree, transactions are validated 
and summarized. Then, miners all throughout the smart home system will 
be able to see the valid transactions that have been added to the block. To 
create a Hash of Block, miners tweak the nonce and timestamp. After that, 
the programme tries to check if the generated hash matches the target [25]. 
The hash is attached to the chain as soon as a miner produces a valid block. 
This procedure repeats until the hash value is greater than or equal to the 
specified threshold. The proof of work is added to the chain and validated 
for efficiency if the hash value is smaller than the target value. This message 
is sent out to all nodes in the network to let them know that the memo pool 
transactions are complete. Because of its adaptability and interoperabil- 
ity with smart home IoT applications, blockchain technology is having an 
ever-increasing impact on the smart home communication environment [26]. 
The four-layer blockchain-based smart home network shown in Figure 6.1 
consists of an IoT data source layer, a blockchain network layer with DELM 
capabilities, a layer for intelligent home devices, and a client node. 


6.3.1 Integration of deep extreme learning 
machine in blockchain-based smart home 


In 2008, blockchain was introduced by M. C. Thrun et al. [27], providing 
a peer-to-peer payments network that addresses third-party removal and 
double-spending issues through a simple blockchain cryptocurrency like bit- 
coins. This system operates as a clustered data structure, with each data block 
authenticated by SHA-256 along with the previous hash block. A basic block 
structure has a block number, a hash of the previous block, transaction infor- 
mation, a nonce, and timestamps. The goal of validators and miners in solving 
cryptographic puzzles, which involve hashing both static and dynamic data, is 
to find a number with consecutive leading zeroes. Winners are the miners who 
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correctly guess the hash value and are given the green light to insert the block, 
which is then validated using the proof of work mechanism. Every IoT device 
in a smart home communicates with a shared memory pool in the same way 
as blockchain miners do their work. All of the transactions that will be part 
of the next block on the blockchain are stored in this memory pool. Using a 
Merkle tree, transactions are validated and summarized [28]. Then, miners all 
throughout the smart home system will be able to see the valid transactions 
that have been added to the block. To create a Hash of Block, miners tweak 
the nonce and timestamp. After that, the programme tries to check if the gen- 
erated hash matches the target. The hash is attached to the chain as soon as 
a miner produces a valid block. This procedure repeats until the hash value is 
greater than or equal to the specified threshold. The proof of work is added 
to the chain and validated for efficiency if the hash value is smaller than the 
target value. This message is sent out to all nodes in the network to let them 
know that the memo pool transactions are complete. 

Because of its adaptability and interoperability with smart home IoT appli- 
cations, blockchain technology is having an ever-increasing impact on the 
smart home communication environment. The four-layer blockchain-based 
smart home network shown in Figure 6.1 consists of an IoT data source layer, 
a blockchain network layer with DELM capabilities, a layer for intelligent 
home devices, and a client node. Critical data for assessing smart homes, 
surroundings, and users is collected by the IoT information layer from vari- 
ous devices. Sensors, multimedia, and medical devices are the three main 
types of these gadgets. Thermostats and other sensors in the IoT sensor 
network determine and control environmental factors, while closed-circuit 
television and wearables are part of the same network. The first layer of the 
stack consists of central databases or repositories that gather information 
from these nodes; one example is a blockchain. Blockchain-based apps can 
be made smarter by applying DELM computing technology. Distributed led- 
ger security and information sharing route efficiency can both be improved 
using DELM. Additionally, it opens the door to possibilities for enhancing 
frameworks through the use of blockchain technology’s centralized design. 

By running the datasets utilized by DELM models across a blockchain 
network, the suggested DELM framework is able to lessen the impact of 
mistakes such as duplicate data, missing values, flaws, and noise. Offering 
distinctive frameworks for a range of applications, including fraud detec- 
tion, the DELM framework can zero down on particular chain pieces 
instead of the full dataset [29]. Smart contracts, the DELM layer, and the 
blockchain information architecture are the three main components of 
the model that is built on top of the IoT edge architecture. Included in 
the smart home framework are technologies like digital markets, access 
control, healthcare-home integration, intelligent community services, and 
automated infrastructure payments. At the very top of the hierarchy is the 
access layer, which opens the door for third parties like microgrids, retailers, 
and utility companies to use smart home devices built on the blockchain. 
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Cloud Storage 


Figure 6.3 Blockchain-based smart home management system empowered with deep 
extreme learning machine. 


The backbone of improved smart home environments are smart home gad- 
gets including cameras, CCTV, smart TVs, fitness trackers, cellphones, and 
actuators. Features such as remote control, alarm generating, and safety 
surveillance are provided by these devices. In order to keep smart home 
operations running smoothly and identify any suspicious activity, a per- 
sonalized access control system is essential. A collection of immutable dis- 
tributed information network access records for every user contains access 
permission specifications for the IoT system [30]. 

Using a home user (Admin) with individualized access to the smart 
house and its applications as an example, we can see how blockchain 
guarantees secure access. Secure access is made possible by blockchain, 
as seen in Figure 6.3. After determining the appropriate level of access, 
users must assign it to the home service machine. Homeowners (as 
Admins) have full power, whereas minors, guests, and strangers have 
intermediate or lower-level permissions. Whenever a user requests legal 
authentication, the home server checks the access control directory and 
then notifies the blockchain layer. The authorization list for various users 
and gadgets is stored in a blockchain policy header. After an administra- 
tor approves or rejects an access request, blockchain miners incorporate 
the policy details into the header and take action, protecting the network 
from harmful attacks. 
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6.3.2 Deep extreme learning machine 


The DELM is useful in many areas, such as predicting health problems, fig- 
uring out how much energy a building will use, transportation, traffic con- 
trol, and more [1]. As described by Ref. [13], the DELM stands out because 
to its fast-learning capability and efficacy in procedural convolution rates, 
in contrast to existing ANN algorithms that frequently necessitate several 
changes and lengthy learning cycles. Because of its fast learning and proce- 
dural convolution rate, the DELM is relevant and useful in many domains 
for regression and classification purposes. 

As a feedforward neural network, the XLR usually only permits bidirec- 
tional data flow over its many layers. But when it comes time to learn, the 
suggested system uses a backpropagation method. By using this method, 
data may be fed back into the network, letting the neural network fine-tune 
its weights for optimal performance with little room for error. During vali- 
dation, when the trained model is used to make predictions based on real 
data, the weights stay the same. An input layer, two or more hidden layers, 
and an output layer make up the suggested DELM method. A DELM has 
numerous hidden layers with a fixed number of neurons, as opposed to the 
single hidden layer with multiple neurons that is typical in extreme learning 
machines. As a result of this setup, the network is more efficient. As seen in 
Table 6.1, we outperformed previous machine learning methods by adding 
more hidden layers while keeping the number of neurons constant. 

To minimize the error rate and change the network weights, DELM com- 
bines the backpropagation and feedforward techniques. When contrasted 
with other machine learning techniques, the DELM framework proves to 
be more accurate. In order to optimize smart home security, the assessment 
layer observes many statistical factors, as shown in Figure 6.4. Precision, 
Sensitivity, Specificity, Positive Prediction Value, and False Positive Rate 
are some of the metrics that fall under this category. Configuring weights, 
feeding them forwards, propagating backward errors, and updating 


Table 6.1 Comparison of the proposed DELM method with other machine 
learning algorithms with different datasets 


Accuracy of Network Security Accuracy of Knowledge 
Laboratory - Knowledge Discovery and Data Mining 
Discovery and Data Mining Cup (KDD-CUP-99) 

Method (NSL-KDD) dataset [14] (%) dataset [15] (%) 
Artificial Neural 82.3 91.49 

Network (ANN) 
Support Vector 68.41 88.44 

Machine (SVM) 
Decision Tree (DT) 82.7 92.24 


Proposed DELM 94.87 95.7 
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Figure 6.4 Performance evaluation of blockchain-based smart home empowered with 
deep extreme learning machine system model during the prediction of mali- 
cious activity or attacks using different statistical measure. 


distinguishability are all parts of the backpropagation process. The design 
of the DELM hidden layer and the sigmoid input function are influenced by 
the fact that each hidden layer neuron is subjected to a sigmoid activation 
function. If the squared difference between the output and the input is less 
than two, then the hidden layer is doing well. To reduce frequent mistakes 
in the network, weight modifications are crucial. 


6.4 RESULTS AND DISCUSSION 


In this study, the proposed framework implemented the DELM using input 
data from Ref. [14]. The dataset was randomly partitioned into 85% for 
training (125,973 samples), and the remaining 15% was allocated for vali- 
dation (22,543 samples). Prior preprocessing of the data was conducted 
to eliminate irregularities and minimize the potential impact of informa- 
tion errors. DELM was employed to detect malicious activities or intrusions 
across various hidden layers, hidden connections, and activation functions. 
The evaluation involved assessing different numbers of neurons in hidden 
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layers and various types of activation functions. The efficiency of the DELM 
system was properly evaluated in this analysis. To compare the output with 
other algorithms, various statistical measures were employed. Table 6.2 
presents the intrusion detection model predictions during the training 
phase for the proposed blockchain-based smart home system empowered 
by the DELM. The training dataset comprised a total of 125,973 samples, 
with 67,343 samples representing normal instances and 58,630 samples 
representing attacks. 

During the analysis, it was found that 66,423 samples from the normal 
class (indicating instances with no detected attacks) were accurately pre- 
dicted, while 1,867 records were erroneously predicted as an attack, despite 
no actual attack occurring. Similarly, in the case of detected attacks, out 
of a total of 58,630 samples, 57,620 were correctly identified as attacks, 
while 2,320 samples were inaccurately predicted as normal instances, even 
though an attack was present. Table 6.3 illustrates the intrusion detection 
model predictions for the proposed blockchain-based smart home system 
empowered by the DELM during the validation phase. The validation data- 
set consisted of 22,543 samples, with 9,610 representing normal instances 
and 12,733 representing attacks. 

Furthermore, additional statistical measures have been incorporated to 
predict values, including false positives, false negatives, likelihood ratios 
(negative and positive), as well as positive and negative prediction values. 


Table 6.2 Predicting harmful actions or assaults and training a 
model for a blockchain-enabled smart home using 
deep extreme learning 


Proposed DELM-based system model (85% of sample data in training) 


Total number of samples 


(N= 125,973) Output results (Og, O,) 

Input Expected output (T,,T,) O, (normal) O, (attack) 
T)=66,423 normal 65,256 1,867 
T,=57,620 attack 2,320 56,120 


Table 6.3 Predicting harmful actions or assaults using a blockchain-based 
smart home equipped with a deep extreme learning machine 
system model 


Proposed DELM-based system model (15% of sample data in validation) 


Total number of samples (N = 22,443) Output results (O,, O,) 
Input Expected output (Tọ T,) Oy (normal) O, (attack) 
T)=9,610 normal 9,337 463 


T,=12,733 attack 878 11,835 
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The outcomes of these measures are presented in Figure 6.4. The analy- 
sis reveals that 9,237 samples from the normal class were accurately pre- 
dicted, while 473 records were erroneously predicted as an attack despite 
the absence of any actual attack. Similarly, in the case of detected attacks, 
out of a total of 12,733 samples, 11,935 were correctly identified as attacks, 
while 898 samples were inaccurately predicted as normal instances when 
an attack was present. Figure 6.3 illustrates the performance of the pro- 
posed blockchain-based smart home system empowered by the DELM in 
terms of various statistical measures during both the training and valida- 
tion phases. The results clearly indicate that during training, the proposed 
system achieves an accuracy of 95.7% and a miss rate of 3.49%. 

In the validation phase, the proposed system achieves an accuracy of 
94.87% and a miss rate of 6.09%. The system model performance is also 
depicted in terms of sensitivity and specificity during both the training and 
testing phases. The results indicate that during training, the proposed sys- 
tem achieves 96.43% sensitivity and 96.6% specificity, while during valida- 
tion, it achieves 91.14% sensitivity and 96.19% specificity. 


6.5 CONCLUSION 


In particular, evaluation and prediction present formidable obstacles to the 
detection of intrusions in smart homes. New developments in blockchain 
technology and AI hold great potential for tackling these issues. Yet, it is 
difficult to execute such solutions efficiently due to the power and process- 
ing limits of devices in the majority of smart home configurations. This 
research sought to address this knowledge vacuum by presenting a mini- 
mally invasive approach to intrusion prediction and detection that makes 
use of a blockchain-based architecture and DELM. The proposed solution 
was evaluated using several statistical approaches; the findings show that 
the DELM method is more reliable than the others. With an impressive 
accuracy rate of 94.81%, the suggested DELM method produced outstand- 
ing results. Additional datasets and various architectures are being consid- 
ered as part of the ongoing investigation into these promising results. 
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Chapter 7 


Securing HPC data clusters 
with in-memory blockchain 


A provenance enhancement approach 


Maradana Durga Venkata Prasad and Srikanth T 


7.1 INTRODUCTION 


Blockchain represents a distributed storage paradigm that incorporates var- 
ious technologies, including encryption algorithms, peer-to-peer networks, 
and consensus algorithms [1]. Recognized as a revolutionary innovation, 
its decentralization, tamper-proof information, and traceability features 
have made it a disruptive force in recent years. Despite its transformative 
potential, large-scale commercial blockchain applications demand a high- 
performance and scalable storage architecture, surpassing the capabilities 
of traditional public blockchain architectures [2]. This has led to the emer- 
gence of consortium blockchains, specifically designed to handle extensive 
transactions in business scenarios, offering superior storage performance 
compared to public counterparts [3]. 

However, existing consortium blockchain platforms face storage bot- 
tlenecks, primarily rooted in their reliance on key-value databases like 
LevelDB. As data volumes increase, this results in significant reading/ 
writing amplification and constant compaction, diminishing storage effi- 
ciency and causing performance bottlenecks. Furthermore, key-value 
databases prove unsuitable for handling substantial single pieces of data, 
leading to exponential increases in storage delays as transaction data size 
grows. Efforts to enhance consortium blockchain storage performance have 
mainly focused on reducing data storage volume rather than restructuring 
the underlying architecture [4]. Approaches such as compressing node data 
and collaborative data storage aim to minimize data in blockchain nodes. 
However, excessive data reduction risks compromising the integrity of 
blockchain records. Some researchers propose distributed extensions, par- 
ticularly employing distributed file systems, to address massive data storage 
challenges. Yet, these architectures prioritize storage extension over perfor- 
mance, failing to meet the efficiency needs of enterprise-level business sce- 
narios requiring rapid and reliable reading/writing capabilities (Figure 7.1). 

Provenance systems traditionally fall into two categories: centralized and 
distributed. SPADE, a prominent centralized system, manages provenance 
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Figure 7.1 Conventional blockchain architecture on shared-nothing platforms. 


collected from various sources using a centralized relational database man- 
agement system (RDBMS). Domain-specific variations following similar 
centralized design principles are prevalent in fields like biomedical engi- 
neering and computational chemistry [5]. However, the exponential growth 
of data has led to criticism of centralized systems. They become perfor- 
mance bottlenecks and single points of failure. This has prompted the 
development of distributed approaches for scalable provenance. Distributed 
provenance systems, often built on distributed file systems rather than 
centralized databases, address the performance limitations of centralized 
counterparts. They demonstrate significantly higher performance, but this 
shift introduces a new concern: who should audit the provenance in dis- 
tributed systems? This question raises the issue of building the provenance 
of provenance, creating an endless recursion. While this concern was less 
critical in centralized approaches due to the application of robust reliabil- 
ity mechanisms to a centralized node, it poses a significant challenge in 
large-scale distributed systems [6]. In such a setup, if any single node is 
compromised, the entire provenance becomes invalid. To tackle this chal- 
lenge, recent advancements in distributed provenance systems draw inspira- 
tion from blockchain technology (Figure 7.2). 

In order to keep track of where data comes from in high-performance 
computing (HPC) systems, this study presents a new blockchain architec- 
ture. The architecture has been meticulously crafted to conform to the spe- 
cific requirements of the HPC environment. Importantly, compute nodes 
can choose to keep the blockchain running even when they do not have 
access to local discs. Reduced persistent data size and I/O overhead are the 
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Figure 7.2 Proposed high-performance computing blockchain architecture with shared 
storage. 


outcomes of this novel technique. To verify the origin of apps’ data, the arti- 
cle suggests a new consensus mechanism called Proof-of-Reproducibility 
(PoR) in addition to the tailored architecture [7]. The underlying idea 
behind Proof-of-Relay (PoR) is that consensus may be reached by utilizing 
both Proof-of-Work (PoW) and Proof-of-Stake (PoS), which are distributed 
ledger technologies (DLT). This novel consensus protocol’s soundness is 
formally proven. A system prototype is developed using around 1,800 lines 
of Java code in order to verify the suggested ideas [8]. We demonstrate the 
practicality of the proposed blockchain architecture and consensus proto- 
col in the context of HPC systems by conducting experimental verification 
of the system’s effectiveness. 

In the subsequent sections of this work, we will examine relevant lit- 
erature in Section 7.2. Section 7.3 presents a blockchain architecture spe- 
cifically designed for HPC. The implementation details of a prototype for 
the proposed blockchain system are elaborated upon in Section 7.4. The 
experimental findings are outlined within the same section. The chapter is 
concluded in Section 7.5. 


7.2 RELATED WORK 


This section delves into the scientific research associated with our explora- 
tion of enhancing the security of HPC Data Clusters through In-Memory 
Blockchain. As outlined in Ref. [9], a methodology is presented for querying 
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provenance, specifically distinguishing between “where” and “why” prov- 
enance within databases. Over the past decade, there has been an unprec- 
edented surge in data traffic, bringing unique attention to the concept of 
HPC Data Clusters with In-Memory Blockchain. This innovative approach 
is under scrutiny across various scientific and engineering domains, includ- 
ing operations management, computer vision, and smart cities. Numerous 
techniques have been proposed to efficiently leverage HPC Data Clusters 
with In-Memory Blockchain. For instance, the integration of HPC Data 
Clusters with In-Memory Blockchain provides mobile networks with 
enhanced opportunities to elevate service quality. Examining the features 
of HPC Data Clusters with In-Memory Blockchain from both mobile net- 
work operators’ and users’ perspectives, the research in Ref. [10] focuses 
on the incorporation of mobile networks. This requires combining infor- 
mation from several sources, such as network operators’ radio access net- 
works, internet service providers, and core networks; user data consists of 
details about the user’s profile and location. The success or failure of mobile 
network services is heavily dependent on how well network operators can 
interpret this data [11]. When it comes to optimizing networks, effective 
data analysis tools are absolutely necessary. There are a number of obsta- 
cles to improving service quality with HPC Data Clusters with In-Memory 
Blockchain, despite the many benefits and uses for such a system. 

Recent research on blockchain technology has shifted its focus to various 
systemic perspectives. For instance, [12] introduced an innovative design 
that employs network-coded distributed storage to address the issue of 
retention bloating in blockchains. ef. [13] investigates methods to safeguard 
blockchain networks against attacks from quantum computing. To enhance 
the hardware-level reliability of blockchain topologies, [14] provides a com- 
prehensive recommendation. Data provenance via blockchain has been 
investigated in a few studies [15-17], although these have been narrowing 
in scope and typically dealt with very small datasets. Gabriel and Markus 
looked at how blockchain DLT relate to PROV standards for data prov- 
enance in their article. They looked at how DLT might facilitate data prov- 
enance in cloud-based HPC data clusters using in-memory blockchain. 

Numerous novel suggestions are currently being considered in block- 
chain research, which is investigating a wide range of system viewpoints. To 
thwart Sybil and targeted assaults, Algorand [18] presents a new approach. 
To improve throughput, Bitcoin-NG [19] chooses a leader from each epoch 
to publish many blocks. Monoxide [20] lessens the load on overloaded nodes 
by distributing computing, storage, and memory resources optimally across 
various zones. The goal of sharding protocols [21,22] is to make distributed 
ledgers larger, whereas the goal of Hawk [23] is to make public blockchain 
transactions private. If you are looking for a PoS protocol that outperforms 
PoW blockchains in terms of efficiency and security, go no further than the 
Ouroboros protocol [24]. Modern publications [25-27] offer methods to 
enhance Byzantine Fault Tolerance (BFT). Furthermore, new consensus 
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protocols are being developed at a rapid pace, with designs tailored to 
in-memory architecture or the establishment of unique security identities for 
participants in networks. Examples of these protocols include PoR [28] and 
proof-of-vote (PoV) [29]. 

Inkchain [30] is an additional permissioned blockchain solution that 
draws inspiration from Hyperledger [31], providing flexibility and improve- 
ment in various situations. Drawing inspiration from Hyperledger, 
BigchainDB [32] uses Practical Byzantine Fault Tolerance (PBFT) ideas to 
improve reliability and fault tolerance. The decentralized and immutable 
nature of blockchain technology is combined with the low latency, high 
transaction rates, and structured data indexing and querying capabilities of 
databases. Separately, in an effort to strengthen data integrity in distributed 
database systems, a two-layer blockchain architecture (2LBC) is presented 
in Ref. [33]. A combination of a leader-rotation strategy and PoW meth- 
ods accomplishes this. One thing to note is that when it comes to bridging 
the gap between blockchain technology and HPC, none of the studies that 
have been mentioned thus far tackle the fundamental platform architecture 
beyond shared-nothing clusters (Figure 7.3). 
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Figure 7.3 In-memory blockchain architecture. 
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Figure 7.4 Blockchain transaction graph system architecture. 


This paper introduces a novel blockchain framework, marking the first 
instance of a practical parallel blockchain-like system developed using 
message passing interface (MPI). This framework is designed to harness 
decentralized mechanisms within HPC systems. While acknowledging 
recent progress in in-memory blockchain systems [3], it is important to note 
advancements in blockchain systems within the MPI and HPC communi- 
ties. In the past, several techniques [19,30,34,35,36] have concentrated on 
improving or characterizing MPI properties for a variety of solutions. But 
these projects are apart from ours since they do not want to build a new 
blockchain architecture that makes use of MPI to manage distributed led- 
gers in a parallel fashion. As a result, we can incorporate these previous 
efforts into our framework to improve MPI-specific packages even further. 

Our blockchain graph analysis system’s design is shown in Figure 7.4. You 
can get block data through a blockchain node or an Ethereum gateway, such 
as Infura or Cloudflare. We chose the Cloudflare gateway for block retrieval 
since synchronizing a node can be a time-consuming process. Following 
their retrieval, the blocks are processed in order to detect transactions that 
pertain to the ether cryptocurrency and the transfer of ERC20 tokens. We 
next use these files to store the transactions, and our blockchain graph sys- 
tem uses them as input. The Zenodo website provides access to the dataset 
and all of its associated formats [14]. In addition to ether, the system also 
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retrieves 31 ERC20 token contracts from blocks. These contracts include 
stable coins like USDT, PAX, EURS, BUSD, GUSD, TRYB, and XAUT. 


7.2.1 Applications of blockchain 


Blockchain, renowned for its safety and security attributes, finds extensive 
applications across various domains. Its versatile applications include cryp- 
tocurrency, healthcare, supply chains, smart contracts, advertising, finan- 
cial services, IoT (Internet of Things), asset management, music, financial 
markets, voting, banking, cybersecurity, copyright and royalties, govern- 
ment, facilitating international payments, weapons tracking, cost reduc- 
tions, digital currency, financial management, identity management, and 
land registration. 


7.2.2 Overview of clustering algorithms 


In the contemporary era, the clustering technique in data analysis is widely 
employed to address the emerging challenges associated with big data. This 
analytical approach involves partitioning a dataset into two subsets, where 
one subset comprises similar instances and the other encompasses dissimi- 
lar instances [3]. Various clustering methods are utilized for this partition- 
ing process, including bi-clustering, density-based, graph-based, grid-based, 
hard clustering, hierarchical, model-based, partitioning, and soft cluster- 
ing. The primary objective of clustering is to group data points into clusters 
based on their similarities, distinguishing one cluster from another. This 
process involves organizing similar points into one cluster and segregating 
other data points into separate groups. The sequential stages of clustering 
are visually represented in Figure 7.5. 
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Figure 7.5 Clustering stages. 
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The primary goal of clustering is to recognize both similar and dissimi- 
lar characteristics or patterns within the provided data. Similar points are 
determined using similarity functions. Subsequent to the clustering process, 
class labels are assigned to the clusters, a step referred to as classification. 
The clustering process takes input from various data sources and yields the 
output as clustered data. Clustering is widely applied in diverse applications 
such as pattern recognition, image processing and analysis, document cat- 
egorization, stock market segmentation, exploratory data analysis, World 
Wide Web, metrology, healthcare, and social network analysis. 


7.3 THE NEW BLOCKCHAIN ARCHITECTURE 
FOR HPC 


A typical HPC system is depicted in Figure 7.3 as a whole, together with 
our proposed distributed in-memory ledger. While we have shown com- 
pute nodes without discs, certain bespoke HPC systems may really have 
node-local discs, so keep that in mind. As an example, the burst buffer is 
a local Solid State Drive (SSD) storage system used by the top-ten super- 
computer Cori [33] at Lawrence Berkeley National Laboratory. Since burst 
buffers are more suited to short-term data caching than long-term archival, 
they are not a good fit for ledger applications. In line with the non-time-shar- 
ing character of scientific applications, they are usually purged after job 
execution for performance and security concerns. We suggest a second- 
ary ledger and validator on remote storage to address the transient nature 
of ledger persistence on local storage. Our proposed system’s three main 
components—a distributed in-memory ledger, persistence protocols, and 
a distant persistent ledger—are depicted in the image. What follows is a 
discussion of each module in further depth. 

At its core, the first module is about deploying a distributed ledger across 
computing nodes that is optimized for high-performance interconnects like 
InfiniBand and protocols like Remote Direct Memory Access (RDMA). 
The goal is to improve communication-intensive consensus methods, 
namely PBFT [34], by using high-performance hardware. Hyperledger [13] 
and other permissioned blockchains use PBFT to ensure that only verified 
users can access the network. This sets it apart from permissionless block- 
chain systems like Ethereum [14], which are accessible to everybody, and 
is especially important in the HPC setting for scientific applications due 
to the strict authentication mechanisms that are in place. Diskless comput- 
ing nodes and distant persistent storage synchronization is the focus of the 
second module. Data validity is strengthened by the persistent ledger kept 
in remote storage, and data reliability is enhanced as a result of this syn- 
chronization. The main obstacle is figuring out how to export the ledger 
from memory to the remote parallel file system. Persisting after each trans- 
action is not practicable because it would cause a performance bottleneck 
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due to the considerable I/O overhead. In the third and final module, we 
validate the distributed ledger on persistent storage. Since every compute 
node updates the ledger in volatile memory (or temporary persistent local 
storage that is erased after a job is finished), there must be an uncompro- 
mised ground truth on permanent persistent storage to validate ledger cop- 
ies in case of disasters (like when half of the compute nodes crash and lose 
their ledgers). Keep in mind that compute nodes that rely just on memory 
are not necessarily less dependable than those that have permanent storage. 
However, if the process that was responsible for initiating the memory were 
to be killed, the data that was saved in memory would be lost. 


7.4 SYSTEM IMPLEMENTATION 


The prototype system [21] for the blockchain architecture and consensus 
methods that are being suggested will be implemented using Java. You can 
find the project webpage at https://expolab.org/http://cse.unr.edu/hpdic/ 
proj/imb, where you can also get the source code and other information. 
The prototype’s main modules are now available for download, and we 
plan to release supplementary components and plug-ins as soon as they pass 
our quality assurance tests. There are more than 1,800 lines of code in the 
current codebase. Local ledgers are kept as independent files in the proto- 
type’s virtualized environment, and network latency is regulated by a time 
delay that is parameterized by random statistical distributions (with aver- 
age, variation, and seed). In order to deploy the prototype on production 
systems, we are currently working on packaging it into Docker containers. 

Our present emphasis is on the ledger and consensus, in accordance with 
the suggested system decomposition in Ref. [7], which specifies four primary 
components (ledger, consensus, cryptography, and smart contract). We use 
the SHA-256 hash algorithm [35] for the chained blocks. Application wrap- 
ping using pseudo transactions with dummy numerical values allows the 
prototype to accept multiple applications, as the current implementation 
does not support smart contracts. The ledger and consensus implemen- 
tation are covered in detail in the following sections. The suggested PoR 
consensus relies on a consensus protocol between compute nodes, which 
is similar to traditional PoW but simplifies the computationally difficult 
problem found in Bitcoin [12]. Our system’s PoW consists of three primary 
components. First, a node stores freshly made transactions, which are then 
sent to other nodes and shared storage, until they form a block that can be 
mined. Second, every node in the network that can process data, including 
those with shared storage, will try to verify the block and add it to their 
local blockchain. In mining, the node that finds the solution adds the block 
to its local in-memory blockchain and gets the reward. If the block does not 
pass validation during this round, it will be queued up on the corresponding 
compute node for processing at a later time. This phase is in line with the 
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standard PoW between compute nodes, but with a modified PoW (i.e., PoR) 
designed for the provenance of scientific data, which is then double-vali- 
dated by the storage node’s ground truth. In the end, consensus is achieved 
when the block is added to the local blockchain by a node and stored in 
shared storage. After that, it is sent to other peers in the network for valida- 
tion. Upon successful validation, the first node adds the block to their local 
blockchain and sends it out to the rest of the network. Then, when each 
node confirms the block’s legitimacy, they update their local blockchains. 


7.5 RESULTS AND DISCUSSION 


This research uses Intel Core-i7 4.2GHz CPUs and 32 GB of 2,400 MHz 
DDR4 memory [21]. The latency of an InfiniBand network is 1 us [36], 
whereas that of an Ethernet network is 250 ps. Two other blockchain sys- 
tems and two other provenance systems will be compared to the proposed 
in-memory blockchain prototype as part of the evaluation. Conventional 
Blockchain, installed on a shared-nothing cluster linked by Ethernet, is the 
initial blockchain system. The second system is a hypothetical blockchain 
that does not use permanent storage but instead makes use of high-perfor- 
mance networking connectivity like InfiniBand or RDMA. The lack of data 
durability makes it an impractical solution, but it does set the performance 
upper bound for the in-memory blockchain that is being proposed. Java is 
used to implement all three blockchain systems, and adequate optimization 
efforts have been made. Both the database and file systems under consider- 
ation have provenance systems as built-in modules. 

Database provenance monitoring and querying is the main focus of 
SPADE [4], a graph database system. Built on top of the distributed file 
system FusionFS [38], FusionProv [1] is the provenance module for file- 
system provenance. The standards are structured like a bank transfer in 
terms of the format of the transactions. The system checks the submitted 
transaction’s legitimacy at the beginning of each transaction. If the request 
is legitimate, two nodes will have their statuses (balances) modified and 
the new information will be sent to every other node in the network. Our 
investigations show that each block can include anywhere from one to ten 
transactions, with an average of four transactions per block. 


7.5.1 Comparison to filesystems and databases 


In this research, we present an in-memory blockchain as a provenance tool. 
Two other tools, a distributed filesystem for HPC [38] and a relational data- 
base for graph processing [4], each impose their own latency overhead. With 
an in-memory blockchain, the latency is unique to the node that wins the 
race and adds the block at the end. Unlike the other two systems, this one 
does not experience latency on any of its nodes. We disperse the in-memory 
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Figure 7.6 Latency overhead of in-memory blockchain, distributed file system, and graph 
database. 


blockchain delay from one node to all nodes to make sure it is fair to com- 
pare. Since the database-based provenance cannot scale beyond tens of 
nodes, the comparison is restricted to no more than 32 nodes. At lower 
scales (four nodes), the in-memory blockchain has latency overhead similar 
to the distributed filesystem, as shown in Figure 7.6. However, when the 
node count increases to eight or more, the in-memory blockchain outper- 
forms the distributed filesystem. At every size, the in-memory blockchain 
far surpasses the graph database. The suggested in-memory blockchain 
outperforms the filesystem by a factor of 32 and the database by a factor 
of four at the 32-node size. The fact that the amortized per-node latency 
drops as the number of nodes increases is another notable feature of the 
in-memory blockchain. 


7.6 CONCLUSION 


This chapter presents a new architecture for blockchain systems that pri- 
marily uses memory to store ledgers, along with a consensus process that 
is specifically intended for this architecture. These developments, taken 
as a whole, make it possible to run a blockchain-like ledger service effi- 
ciently, which guarantees trustworthy data provenance on HPC systems. 
Experimental validation and theoretical support both back up the proposed 
consensus. An evaluation of a lightweight system prototype with more than 
one million transactions showed a 32x acceleration when compared to 
provenance services based on filesystems and a four orders of magnitude 
acceleration when compared to provenance services based on databases. 


116 Big Data and Blockchain Technology for Secure loT Applications 


AUTHOR DETAILS 


Dr. Srikanth Thota received his Ph.D in 
Computer Science Engineering for his research 
work in Collaborative Filtering-based 
Recommender Systems from JNTU, Kakinada. 
He received M.Tech. degree in Computer 
Science and Technology from Andhra University. 
He is presently working as an associate 
professor in the Department of Computer 
Science and Engineering, School of Technology, 
GITAM University, Visakhapatnam, Andhra 
Pradesh, India. His areas of interest include 
machine learning, artificial intelligence, data 
mining, recommender systems, and soft 
computing. 


Mr. Maradana Durga Venkata Prasad received his 
B.Tech. (Computer Science and Information 
Technology) in 2008 from JNTU, Hyderabad, 
and M.Tech (Software Engineering) in 2010 from 
Jawaharlal Nehru Technological University, 
Kakinada. He is a research scholar with Regd 
No: 1260316406 in the Department of 
Computer Science and Engineering, Gandhi 
Institute Of Technology And Management 
(GITAM), Visakhapatnam, Andhra Pradesh, India. 
His research interests include clustering in data 
mining, Big Data analytics, and artificial 
intelligence. He is currently working as an 
Assistant Professor in the Department of 
Computer Science Engineering, CMR Institute 
of Technology, Ranga Reddy, India. 


REFERENCES 


1. A. Gehani and D. Tariq, “SPADE: Support for provenance auditing in distrib- 
uted environments,” In Proceedings of the 13th International Middleware 
Conference (Middleware), Montreal, QC, Canada 2012. 

2. T. Clark, P. N. Ciccarese, and C. A. Goble, “Micropublications: A semantic 
model for claims, evidence, arguments and annotations in biomedical com- 
munications,” Journal of Biomedical Semantics, vol. 5, no. 1, p. 28, 2014. 

3. E. Pettersen, T. Goddard, C. Huang, G. Couch, D. Greenblatt, E. Meng, 
and T. Ferrin, “UCSF chimera: A visualization system for exploratory 
research and analysis,” Journal of Computational Chemistry, vol. 25, no. 13, 
pp. 1605-1612, 2004. 

4. R. Ranjan, D. Pandey, A. K. Rai, D. Gupta, P. Singh, P. R. Kumar, and S. 
N. Mohanty, “A manifold-level hybrid deep learning approach for sentiment 
classification using an autoregressive model,” Applied Sciences, vol. 13, no. 5, 
p. 3091, 2023. 


Securing HPC data clusters with in-memory blockchain 117 


10. 


11. 


12. 


13. 


14. 


15: 


16. 


17. 


. D. Dai, Y. Chen, P. Carns, J. Jenkins, and R. Ross, “Lightweight provenance 
service for high-performance computing,” In International Conference on 
Parallel Architectures and Compilation Techniques, Portland, OR, USA 2017. 

. B. Shilpa, P. R. Kumar, and R. K. Jha, “LoRa DL: A deep learning model for 
enhancing the data transmission over LoRa using autoencoder,” The Journal 
of Supercomputing, vol. 79, pp. 17079-17097, 2023. 

. X. Liang, S. Shetty, D. Tosh, C. Kamhoua, K. Kwiat, and L. Njilla, “Provchain: 
A blockchain-based data provenance architecture in cloud environment with 
enhanced privacy and availability,” In IEEE/ACM International Symposium 
on Cluster, Cloud and Grid Computing (CCGRID), Madrid, Spain 2017. 

. Aravind Ramachandran and M. Kantarcioglu, “Smartprovenance: A dis- 
tributed, blockchain based data provenance system,” In Proceedings of the 
Eighth ACM Conference on Data and Application Security and Privacy, 
Series, CODASPY’18, NY, United States pp. 35-42, 2018. 

. X. Chen, S. Lin, and N. Yu, “Bitcoin blockchain compression algorithm 

for blank node synchronization,” In Proceedings of 11th International 

Conference on Wireless Communications and Signal Processing (WCSP), 

Xian, China, pp. 1-6, October 2019. 

R. Ranjan, D. Pandey, A. K. Rai, D. Gupta, P. Singh, P. R. Kumar, and S. 

N. Mohanty, “A manifold-level hybrid deep learning approach for sentiment 

classification using an autoregressive model,” Applied Sciences, vol. 13, no. 5, 

p. 3091, 2023. 

Z. Guo, Z. Gao, H. Mei, M. Zhao, and J. Yang, “Design and optimization 

for storage mechanism of the public blockchain based on redundant residual 

number system,” IEEE Access, vol. 7, pp. 98546-98554, 2019. 

Y. Xu, “Section-blockchain: A storage reduced blockchain protocol, the foun- 

dation of an autotrophic decentralized storage architecture,” In Proceedings of 

23rd International Conference on Engineering of Complex Computer Systems 

(ICECCS 2024), Melbourne, VIC, Australia pp. 115-125, December 2018. 

N. Arivazhagan, K. Somasundaram, G. B. Mohammad, P. R. Kumar, et al., 

“Cloud-Internet of Health Things (IOHT) task scheduling using hybrid moth 

flame optimization with deep neural network algorithm for e healthcare sys- 

tems,” Scientific Programming, vol. 2022, pp. 1-12, 2022. 

T. Liu, J. Wu, J. Li, and J. Li, “Secure and balanced scheme for nonlo- 

cal data storage in blockchain network,” In Proceedings of 2019 IEEE 

21st International Conference on High Performance Computing and 

Communications; IEEE 17th International Conference on Smart City; 

IEEE Sth International Conference on Data Science and Systems (HPCC/ 

SmartCity/DSS), Zhangjiajie, China, pp. 2424-2427, 2019. 

P. R. Kumar, G. B. Mohammad, and P. Dileep, “Real-time heart rate moni- 

toring system using least square method,” Annals of the Romanian Society 

for Cell Biology, vol. 25, no. 6, pp. 16302-16308, 2021. 

L. Aniello, R. Baldoni, E. Gaetani, F. Lombardi, A. Margheri, and V. Sassone, 

“A prototype evaluation of a tamper-resistant high performance block- 

chain-based transaction log for a distributed database,” In 13th European 

Dependable Computing Conference (EDCC), Geneva, Switzerland 2017. 

D. Dai, Y. Chen, P. Carns, J. Jenkins, and R. Ross, “Lightweight provenance 

service for high-performance computing,” In International Conference on 

Parallel Architectures and Compilation Techniques (PACT), Portland, OR, 

USA 2017. 


118 


Big Data and Blockchain Technology for Secure loT Applications 


18. 


19; 


20. 


21. 


22. 


23: 


24. 


25. 


26. 


Zid 


28. 


29. 


30. 


D. Dai, Y. Chen, D. Kimpe, and R. Ross, “Provenance-based object storage 
prediction scheme for scientific big data applications,” In IEEE International 
Conference on Big Data (BigData), Washington, DC, USA 2014. 

G. B. Mohammad, Selvarajan Shitharth, and P. R. Kumar, “Integrated machine 
learning model for an URL phishing detection,” International Journal of Grid 
and Distributed Computing, vol. 14, no. 1, pp. 513-529, 2021. 

X. Niu, R. Kapoor, B. Glavic, D. Gawlick, Z. H. Liu, V. Krishnaswamy, and 
V. Radhakrishnan, “Provenance-aware query optimization,” In IEEE 33rd 
International Conference on Data Engineering (ICDE), San Diego, CA, USA 
2017. 

P. Mehta, S. Dorkenwald, D. Zhao, T. Kaftan, A. Cheung, M. Balazinska, 
A. Rokem, A. Connolly, J. Vanderplas, and Y. AlSayyad, “Comparative 
evaluation of big-data systems on scientific image analytics workloads,” In 
Proceedings of the 43rd International Conference on Very Large Data Bases 
(VLDB), Washington, DC, USA 2017. 

T. Li, C. Ma, J. Li, X. Zhou, K. Wang, D. Zhao, and I. Raicu, “Graph/z: 
A key-value store based scalable graph processing system,” In IEEE 
International Conference on Cluster Computing, Chicago, IL, USA 2015. 

P. R. Kumar and T. Ananthan, “Machine vision using LabVIEW for label 
inspection,” Journal of Innovation in Computer Science and Engineering 
(JICSE), vol. 9, no. 1, pp. 58-62, 2019. 

D. Zhao, J. Yin, K. Qiao, and I. Raicu, “Virtual chunks: On supporting ran- 
dom accesses to scientific data in compressible storage systems,” In [EEE 
International Conference on Big Data, Washington, DC, USA pp. 231-240, 
2014. 

D. Zhao, J. Yin, and I. Raicu, “Improving the I/O throughput for data 
intensive scientific applications with efficient compression mechanisms,” In 
International Conference for High Performance Computing, Networking, 
Storage and Analysis (SC’13), Poster Session, Geneva, Switzerland 2013. 

P. R. Kumar, “Wireless mobile charger using inductive coupling,” Journal 
of Emerging Technologies and Innovative Research (JETIR), vol. 5, no. 10, 
pp. 40-44, 2018. 

H. Qin, S. Zawad, Y. Zhou, L. Yang, D. Zhao, and F. Yan, “Swift machine 
learning model serving scheduling: a region based reinforcement learn- 
ing approach,” In Proceedings of the International Conference for High 
Performance Computing, Networking, Storage and Analysis (SC), Montreal, 
QC, Canada 2019. 

E. Saillard, P. Carribault, and D. Barthou, “Static/dynamic validation of MPI col- 
lective communications in multi-threaded context,” In Proceedings of the 20th 
ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 
PPoPP 2015. ACM, New York, United States. pp. 279-280, 2015. 

B. Shilpa, P. R. Kumar, and R. K. Jha, “Spreading factor optimization for 
interference mitigation in dense indoor LoRa networks,” IEEE IAS Global 
Conference on Emerging Technologies (GlobConET), London, UK pp. 1-5, 
2023. 

C. Shou, D. Zhao, T. Malik, and I. Raicu, “Towards a provenance- aware dis- 
tributed filesystem,” In TaPP Workshop, USENIX Symposium on Networked 
Systems Design and Implementation (NSDI), Chicago, USA, 2013. 


Securing HPC data clusters with in-memory blockchain 119 


3l 


32. 


33% 


34. 


35: 


36. 


. J. Sousa, A. Bessani, and M. Vukolic, “A byzantine fault-tolerant ordering 
service for the hyperledger fabric blockchain platform,” In 48th Annual 
IEEE/IFIP International Conference on Dependable Systems and Networks 
(DSN), Luxembourg, Luxembourg 2018. 

J. Wang and H. Wang, “Monoxide: Scale out blockchain with asynchronized 
consensus zones,” In 16th USENIX Symposium on Networked Systems Design 
and Implementation (NSDI 19), Boston, MA, 2019. USENIX Association. 

P. R. Kumar, “Position control of a stepper motor using LabVIEW,” 3rd 
International Conference on Recent Trends in Electronics, Information 
Communication Technology (RTEICT), Bangalore, India pp. 1551-1554, 
May 2018. 

M. Zamani, M. Movahedi, and M. Raykova, “Rapid chain: Scaling block- 
chain via full sharding,” In Proceedings of the 2018 ACM SIGSAC Conference 
on Computer and Communications Security. ACM, New York, United States 
pp. 931-948, 2018. 

K. Zhang and H.-A. Jacobsen, “Towards dependable, scalable, and pervasive 
distributed ledgers with blockchains,” In 38th IEEE International Conference 
on Distributed Computing Systems (ICDCS), Vienna, Austria 2018. 

P. R. Kumar and B. Shilpa, “An IoT-based smart healthcare system with edge 
intelligence computing,” In S. Satpathy, S. N. Mohanty, and S. Potluri (eds.), 
Reconnoitering the Landscape of Edge Intelligence in Healthcare. CRC 
Press, Boca Raton, FL, pp. 31-46, 2024. 


Chapter 8 
DeepShield 


A deep learning approach for 
robust fraud detection in credit 
financial transactions 


Mulagundla Sridevi, Gouthami Velakanti, 
B. Deevena Raju, and Sadda Bharath Reddy 


8.1 INTRODUCTION 


Financial statements are prepared by accounting and finance departments 
and then examined by regulatory agencies like SEBI and RBI to make sure 
they are authentic. A nation’s economic prosperity is dependent on the sta- 
bility of its businesses and the security of its investors’ money. The financial 
situation, investments, obligations, interest paid, and interest earned can 
all be found in the financial statements. A company’s financial health is a 
true reflection of how its assets and liabilities have grown over time. These 
declarations are used by companies to get additional loans or investments. 
Rating agencies use them to assign credit scores, and investors use them 
to make educated investment decisions. Governments or third parties uti- 
lize the statements to recognize exceptional performance with prizes, while 
creditors use them to approve or recoup debts. 

It is possible for corporations to falsify financial statements in order to 
increase their attractiveness to potential investors, lenders, or award givers, 
because these statements are used by outside parties to evaluate a com- 
pany’s financial health. It is crucial to confirm the legitimacy of financial 
statements before to making judgments, as fraud is becoming more com- 
mon in emerging economies. An option for confirmation is to use a machine 
learning (ML) model, which can be a cataloging model trained on tagged 
financial records. Credible or dishonest are some of the terms used by pro- 
fessional auditors to classify statements. 

These ML models play a crucial role in categorizing financial informa- 
tion within the context of digital transitions within enterprises. Not only do 
they reveal whether fraud is present, but they also provide an explanation 
for why some assertions are deemed false. When conducting manual audits, 
it is critical for auditors to have a firm grasp on the main causes of fraud. 
Although the steps for creating a classification model or extracting key 
components are same, it is common practice to use an additional method, 
such SHAP, to determine which factors are crucial. Several algorithms have 
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been investigated in the past for the purpose of detecting financial state- 
ment fraud. Classification methods like as logistic regression (LR), xgboost, 
adaboost, and neural networks (NNs) are commonly employed; the models 
are selected according to the dataset’s characteristics and the level of accu- 
racy needed. But not every algorithm can be explained. Black box models, 
such as NNs, xgboost, or adaboost, produce reliable classifications but do 
not reveal what criteria were used to arrive at that designation. So, ‘black 
box’ is a common name for these types of models. 

Successful alternative methods for accurate categorization include sup- 
port vectors [4,5,6] and Zipf’s law [7]. NNs and belief networks [8], NNs 
using the backpropagation algorithm [9], NNs for risk assessment [10], pre- 
processing methods for data preparation in training fraud detection models 
[11], the importance of digit distribution [12], and other significant meth- 
odologies [13] have all been extensively studied in previous research. By 
employing algorithms such as regression and tree-based approaches, the 
authors tested 38 models with various dataset combinations in order to 
determine the best model and important factors [14]. The approach included 
training the models using different techniques and datasets that were either 
over-sampled or under-sampled. 

Prediction accuracy in ML models was dependent on features and feature 
engineering, in contrast to many older models that used human feature der- 
ivation inputs into a classification layer to establish probabilities. You can 
reduce your reliance on feature engineering by using deep learning models. 
Recent studies have looked into employing deep learning algorithms like 
convolutional neural networks (CNNs) and DNNs to identify fraudulent 
activities and extract important features. CNNs were developed in Ref. 
[16], while DNNs have evolved over time thanks to the contributions of 
several scholars [15]. Although CNNs first gained traction in the image 
processing industry, they have now found use in other domains, such as 
Natural Language Processing (NLP) and fraud detection [17]. 


8.2 RELATED WORK 


The ever-increasing volume of online purchases has put credit card fraud 
detection at the forefront of research priorities. Class imbalance and data’s 
inherent volatility are two major obstacles to effective fraud detection [18]. 
Resampling is one of several approaches that have been developed to solve 
the long-standing issue of class imbalance in credit card fraud detection 
[19]. To do this, a training dataset must be balanced, which means either 
under-sampling the majority class or over-sampling the minority class [20]. 
Bagging, boosting, and stacking are ensemble approaches that have also 
been used to address class imbalance [23]. One alternative is cost-sensitive 
learning, which uses a cost-based approach to classifying misclassification 
errors and usually gives a higher cost to minority classes [24]. 
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In addition to imbalance in number, the spatial distribution of cases 
from different classes greatly affects the results of classifiers. Cases close to 
the border between majority and minority classes, for example, are criti- 
cal for correct identification, which is why techniques like Gaussian mix- 
ture under-sampling have been proposed [25]. Adapting user transaction 
habits leads to variations in consumption seasonality and fraud patterns, 
which in turn creates the difficulty of data dynamic change in transaction 
data. Conventional approaches to fraud detection, like CNNs, random 
forests (RFs), and support vector machines (SVMs), frequently presume 
homogeneous classes and constant data distribution. A thorough analy- 
sis of techniques used to identify credit card fraud from 1990 to 2017 
reveals an emphasis on improving classifier performance through the use 
of combined class imbalance processing methods, with little regard for the 
dynamic change in data [23]. Concept drift is one way that data is always 
changing; research into this topic has focused on finding new ideas quickly 
and adapting classifier updates to account for them [5]. Because fraudu- 
lent and legitimate transactions are fundamentally different, it is critical to 
derive accurate representations that consistently differentiate between the 
two, even as fraud tactics change. Building a reliable model for detecting 
fraud using deep representation learning techniques is the main focus of 
this article [12]. 


8.2.1 Deep supervised representation learning 


In order to improve the performance of classifiers or predictors, representa- 
tion learning entails learning a new representation for the provided data, 
with the goal of capturing more valuable information [12]. This method 
has been quite effective in many fields, but it has especially shined in super- 
vised learning-based large-scale visual categorization for feature extrac- 
tion [25]. The visual domain has been the site of a great deal of research 
into deep representation learning, with many studies taking advantage of 
open-source datasets such as ImageNet, LFW, and COCO [2,7,8]. The per- 
formance limits of the related representation model are heavily influenced 
by the architecture of a deep NN. CNN’ depth and breadth can be tuned 
to control their capacity, and several designs have found success with this 
approach, such as ResNet, DenseNet, and BagNet [1,3,9]. 

To improve classification performance and create a balanced dataset, 
a thorough mechanism was proposed in the context of credit card fraud 
detection that uses K-means clustering and the genetic algorithm to gener- 
ate new data samples for minority clusters [14]. Using a genetic algorithm 
influenced by natural selection and genetics, this method forms clusters 
of similar data points using unsupervised learning. Then, fresh samples 
for minority classes are generated. Making training sets for detecting card 
fraud that are more evenly distributed and with fewer classification errors 
is the goal. Ensemble learning was the subject of an alternative study that 
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sought to identify instances of credit card fraud [15]. In order to make 
better predictions, ensemble learning uses a combination of different ML 
classifiers. For accurate fraud and non-fraud case identification, RFs and 
Artificial Neural Networks (ANNs) are employed. Using a mix of three 
feed-forward NNs with distinct hyperparameters and two RF classifiers 
with different decision trees (DT), the study acknowledges the significant 
monetary cost of misclassifying both legitimate and invalid transactions. 
Finally, the output is determined by averaging the results of the five models. 

Credit card fraud detection was the subject of a comparative study in 
which a CNN, a multilayer perceptron layer (MPL), and a basic NN were 
evaluated. Using common variables in financial institution databases and 
conventional predictors for predictive modeling, the self-generated dataset, 
which included 60,000 transactions and 12 features, was constructed. The 
study used a learning rate of 0.001 with the ‘softmax’ activation function, 
and the imbalanced dataset was balanced by under-sampling. The accuracy 
levels displayed by MPL and CNN were 87.88% and 82.86%, respectively. 
Credit card fraud detection using a real-time deep learning model incorpo- 
rating auto-encoders was presented in another work [17]. The confusion 
matrix, recall, accuracy, and precision were performance measurements. 
The majority of fraudulent transactions were predicted by non-linear 
auto-regression, but many genuine transactions were misclassified as well. 
When it came to valid transactions, LR had the best misclassification error 
rate, but when it came to fraud, it was very inaccurate. Here, the deep NN 
Auto Encoder showed consistent performance, with a better prediction rate 
and less misclassification error. 

Credit card fraud detection using CNNs was discussed in Ref. [18] 
because of CNNs’ capacity to find hidden fraud tendencies and reduce over- 
fitting. Trading entropy was a new feature that was introduced as part of 
feature engineering, which also included creating aggregated features from 
transaction data. In order to ensure that the dataset was balanced, synthetic 
fraud samples were generated from actual fraudulent data using cost-based 
sampling. The assessment criteria was the F1 score, and the CNN used was 
six layers strong, similar to LeNet. For various sample sets, CNN outper- 
formed NN, SVM, and RF; this was particularly true when the trading 
entropy feature was included. Also, Ref. [19] looked into credit card fraud 
detection down to the transaction level, highlighting how crucial it is to 
account for the passage of time in a series of transactions to account for 
the dynamic nature of fraud. To boost classification accuracy, statistical 
features were included based on actual features. 

Credit card fraud detection using CNN, stacked long short-term mem- 
ory (SLSTM), and a CNN-LSTM hybrid model was discussed in Ref. [20]. 
CNN was great at learning from very brief sequences, whereas LSTM was 
great at learning from very long ones. The study used principal component 
analysis (PCA) to reduce dimensionality using a dataset from an Indonesian 
bank with different non-fraud to fraud ratios. Raising the ratio enhanced the 
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classifier’s accuracy, according to the results. The order of training accuracy 
was as follows: SLSTM, CNN-LSTM, and CNN. With that said, taking into 
account the datasets’ inherent imbalance, CNN outperformed CNN-LSTM 
and SLSTM according to area under the curve (AUC) values, highlighting 
the predominance of short-term linkages in fraud transaction patterns. 


8.3 METHODOLOGY 


The lack of publicly available datasets is a challenge for ML approaches 
used to detect credit card fraud. This is mostly due to the sensitive nature of 
financial data and the requirement to preserve user privacy. The results of 
ML models might change substantially across various datasets or business 
scenarios, and most studies in this area only use one dataset. Key research 
agenda items for this study include investigating performance variations 
across three datasets with different feature and transaction counts. Another 
difficulty is the class imbalance problem, which occurs when there are far 
fewer instances of fraud than regular transactions. One of the secondary 
goals of this research is to learn how different sampling strategies for deal- 
ing with class imbalance affect the performance of the models. 

A lot of people have suggested using LR, SVMs, and DT to identify credit 
card fraud. In contrast, massive datasets might challenge these algorithms. 
Due to their capacity to handle enormous datasets, deep learning algorithms 
like CNN and LSTM are recommended for image classification and NLP, 
respectively. Examining the efficacy of various deep learning techniques for 
credit card fraud categorization is the primary goal of this research. Another 
important step in ML is data preprocessing. The purpose of this research is 
to examine the relationship between data preprocessing methods and clas- 
sification performance in the context of detecting credit card fraud. 


8.3.1 One-dimensional CNN (IDCNN) 


In the realm of image processing in particular, the deep learning technique 
known as a CNN is frequently linked with geographical data. While CNNs 
share similarities with ANNs, they differ in the convolution layers they use, 
which can have different numbers of channels. These layers are used for 
hidden layer processing. ‘Convolution’ refers to the process of extracting 
crucial information from data by means of moving filters. One reason CNN 
is so popular in image processing is its ability to automatically reduce fea- 
tures, which makes it more resistant to overfitting. Therefore, considerable 
data preprocessing is not necessary for training CNN. Minimizing pro- 
cessing by reducing image size without losing crucial information for mak- 
ing predictions is the primary purpose of CNN in image processing [21]. 
Features maps, channels, pooling, stride, and padding are essential ideas in 
CNNs. Contrary to the well-known multilayer perceptron (MLP) network, 
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CNN does not have a fully connected topology from layer to layer. To 
reduce the number of parameters in a CNN model, CNN uses a constant 
weight parameter for each filter, in contrast to MLP where each node has 
variable associated weights. The feature detection process is also improved 
by the pooling method, making it more resistant to changes in element size 
and position in the image. The study uses 2DCNN and 1DCNN to catego- 
rize situations as either fraud or non-fraud. The 2DCNN is applied to the 
30-feature European Card Dataset. To feed into the 2DCNN model, each 
transaction sample is transformed into a two-dimensional picture. 


8.3.2 Long short-term memory network 


Long short-term memory, often known as LSTM, is classified as a type 
of NN known as Recurrent Neural Networks (RNNs). Long short-term 
memory, often known as LSTM, is classified as a type of NN known as 
Recurrent Neural Networks (RNNs) is designed as memory-enabled NNs, 
as opposed to regular NNs, which are unable to remember past data and 
need to retrain for each new task. The vanishing gradient problem, how- 
ever, makes short-term memory a common challenge for RNNs. Put simply, 
during backpropagation, the gradient decreases as it travels backward in 
the network, resulting in minimal modifications to the weights. This means 
that RNNs can only store short-term information, as the earlier layers of 
the network do not learn much and cannot recall early examples in long 
sequences. LSTM networks overcome RNNs’ short-term memory problem 
by including a network memory (cell state) that is passed across each step. 
The forget gate, input gate, and output gate are the essential components 
that are linked to each phase. Each gate in the process has a specific func- 
tion: the forget gate selects data to be retained from the previous phase, the 
input gate selects data to be added from the current step, and the output 
gate selects the concealed state for the next step (Table 8.1). 

Crucial to overcome the constraints of short-term memory, the cell state 
remembers important information beginning with the earliest examples 
in the sequence. This is in contrast to the hidden state, which has limited 


Table 8.1 LSTM structure 


Layer | Input 

Input shape (1, number of features) 
Layer 2 Dense 

Number of LSTM blocks 50 

Activation function ReLU 

Layer 3 Output 


Number of nodes l 
Activation function Sigmoid 
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Hidden 


Figure 8.1 A long short-term memory cell. 


long-term memory but remembers the model’s past inputs. In Figure 8.1, 
each dot represents a single step in the full flow of an LSTM cell [22]. 

The forget gate is involved in the initial phase, which is seen in the red 
dotted box. By merging the prior hidden state (4,-1) with the present input 
(x,) and applying a ‘sigmoid’ activation function, we can produce an output 
that can be either 0 (total forget) or 1 (total retain). The next stage, indicated 
by the yellow dot in the box, is the input gate. Here, the ‘sigmoid’ and ‘tanh’ 
activation functions are applied to the past hidden state and the present input, 
respectively, and the resulting values are multiplied. While the ‘sigmoid’ func- 
tion decides what data to keep from the present regulation, the ‘tanh’ func- 
tion controls the model. Computing the cell state is the next step, as seen 
in the purple dotted box. The new cell state (C,) is obtained by multiplying 
the previous cell state (C,-1) with the forget gate’s output and then add- 
ing the product to the input gate’s output in a pointwise fashion. Calculating 
the new hidden state (,) is the responsibility of the output gate, the last step 
depicted in the dotted box. Before being multiplied by the output of the ‘sig- 
moid’ activation function, which takes the previous concealed state and cur- 
rent input as input, the new cell state goes via a ‘tanh’ activation function. 
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8.4 RESULTS AND DISCUSSION 


In this study, we use datasets with different sample sizes and feature densi- 
ties to evaluate classifiers’ ability to detect credit card fraud. This is accom- 
plished by making use of three separate datasets: ET Data (Europe), SCD 
(Small Card Data), and TCD (Tall Card Data). These datasets are very 
imbalanced, with many fewer occurrences of fraud than normal transac- 
tions; this is typical of credit card fraud datasets. A ‘0’ representing no 
fraud and a ‘1’ representing fraud were used to label all three datasets used 
in this investigation. What follows is a breakdown of each dataset along 
with the percentage of class imbalance and other pertinent characteristics. 


8.4.1 European card data 


This dataset covers two days of transaction data for European cardholders in 
September 2013. It was retrieved from Kaggle and sourced from the Machine 
Learning Group of Université Libre de Bruxelles. With 31 characteristics and 
284,807 examples, it is quite extensive. Out of all the samples, just 492 have 
been found to be instances of fraud, which is <0.172% of the total dataset. 
With the exception of ‘Time’ and ‘Amount’, all characteristics in the dataset 
have been transformed using PCA in order to preserve client privacy and safe- 
guard the sensitive nature of transaction details. ‘Time’ shows the entire length 
of time that has passed since the first dataset sample, in seconds, and ‘length’ 
shows the total amount of money that has changed hands. The electricity 
consumption dataset of Uruguay (ECD) dataset is used in this investigation. 


8.4.2 Small card data 


This 3,075-sample dataset with 12 characteristics was sourced from Kaggle 
and is on the smaller side. There are numerical features and categorical 
features split evenly. Out of a total of 3,075 samples, 448 (or 14.6%) were 
found to be instances of fraud. Dataset features include merchant ID, date 
of transaction, average daily transaction amount, amount declined, num- 
ber of declines per day, foreign transaction status, high-risk country status, 
average daily chargeback amount, average 6-month chargeback amount, 
frequency of 6-month chargebacks, and fraudulent status. We call this 
dataset SCD since its rows and columns are quite small. 


8.4.3 Evaluation metrics 


The datasets have a large class imbalance, making accuracy the wrong 
criterion to use when comparing models. In credit card fraud detection 
systems, for example, the main objective is to catch all instances of fraud 
while avoiding false alarms, or legal transactions mistakenly recognized 


128 Big Data and Blockchain Technology for Secure loT Applications 


as fraudulent. The nature of the solution dictates the choice of assessment 
metric. This research makes use of the confusion matrix, which classifies 
cases as either positive (fraud) or negative (non-fraud). False positives indi- 
cate non-fraud cases anticipated as fraud, genuine negatives imply fraud 
instances forecast as non-fraud, and true negatives indicate accurately pre- 
dicted fraud cases. Take a look at the F1 score, accuracy, precision, and 
recall equations down below to learn more about the assessment measures. 


TP+TN 


A = 8.1 
NY = TP 4 FP + TN +FN ga 
TP 
Recall = —— 8.2 
eca TR EN 182) 
Precision = a= (8.3) 
TP + FP 
FScore = 2* (Precision x Recall) (8.4) 


Precision + Recall 


Improvements in accuracy can be achieved by decreasing the occurrence of 
false positives, which are correlated with positive projected values. Precision 
is an appropriate indicator when the cost of false positives is substantial. The 
number of false negatives should be minimized in order to maximize recall, 
as it is connected to actual positives (Eq. 8.3). Achieving high recall is typi- 
cally prioritized in scenarios when the cost of false negatives is large. Striking 
a balance between the two is of the utmost importance for detecting credit 
card fraud. A high recall, but poor accuracy, precision, and F1 score are the 
result of labeling all samples as fraudulent. However, if we assume that all 
samples are legitimate, we will have great accuracy but no recall, and our pre- 
cision and F1 score will be unknown. To conduct comparisons, this research 
makes use of all four metrics: accuracy, precision, recall, and F1 score. 


8.5 DISCUSSION 


8.5.1 Data preprocessing 


Data preprocessing is the first step of the experiment, and it entails going over 
all three datasets by hand and applying statistical processes. Optimal output 
from classifiers is the goal of data preprocessing, which entails presenting 
them with refined input. Missing data, categorical features, variable scalabil- 
ity, and high dimensionality are just a few of the variables that might affect 
classifier effectiveness. Data exploration and scaling are two preprocessing 
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Table 8.2 Data exploration 


ECD 

Number of rows 284,807 
Number of columns 31 
Feature type Numeric 
Missing values None 
Dropped features None 
Categorical to numeric None 
Smaller sample used No 

SCD 

Number of rows 3,075 
Number of columns 12 
Feature type Numeric + categorical 
Missing values 3,075 


Dropped features 
Categorical to numeric 


‘Transaction date’ 


‘Merchant_id’, ‘ls declined’, ‘isForeignTransaction’, 
‘isHighRiskCountry’, ‘isFradulent’ 


Smaller sample used No 

TCD 

Number of rows 10,000,000 
Number of columns 9 

Feature type Numeric 
Missing values None 
Dropped features ‘custID’ 
Categorical to numeric None 
Smaller sample used Yes 


approaches used in this work, together with a test-train split. Table 8.2 pro- 
vides important details about the data exploration method. There were no 
missing values found in the ECD dataset, and all features were numerical. In 
order to sanitize the data, no features were removed. The ‘Transaction Date’ 
feature was eliminated from the SCD dataset because it included no values at 
all, and all other categorical characteristics were transformed into numerical 
ones. All data from the ECD and SCD sets were used. A subset of the full 
dataset was used for the TCD dataset. The ‘custID’ feature was removed 
from the TCD dataset since it included only unique values and did not con- 
tribute any information. All of the values in the dataset were numerical. 
Next, we look at the association between features for each dataset after 
we have explored the data. One statistical tool for exploring the relation- 
ships between variables is the correlation coefficient, which can take on val- 
ues between —1 and 1. No association is indicated by a correlation value of 
0, an inverse relationship by a negative correlation, and a direct relationship 
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by a positive correlation. In order to reduce the data’s dimensions, it is 
helpful to identify correlations so that features with similar behavior can 
be eliminated. Training times and classification performance are both 
enhanced by using lower-dimensional data. Feature correlation in the SCD 
dataset is shown in Figure 8.2. The lack of a negative correlation between 
features and the uneven distribution of correlation across the SCD dataset 
are both obvious. Greater transaction quantities are associated with higher 
average daily transaction amounts. There is a robust relationship between 
the daily chargeback amount, the 6-month chargeback amount, and the 
6-month chargeback frequency. Among the many factors used to categorize 
fraud, the ‘high-risk nation’ attribute stands out. No dimensionality reduc- 
tion is done because the dataset is small. The performance is assessed using 
the test dataset, and Figure 8.3 shows a comparison of various class imbal- 
ance ratios in the ECD dataset. 
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Figure 8.2 Class imbalance comparison (test data ECD). 
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Figure 8.3 Fl score — SCD vs ECD vs TCD. 
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In the beginning, recall is the highest metric while all the others are at their 
lowest, according to the observation. Recall drops while F1-score, accuracy, 
and precision go up when the class imbalance goes up. Without a 1:1 class 
imbalance, the model accurately predicts fraud instances but has poor accu- 
racy when it comes to non-fraud cases. Due to the substantially higher pro- 
portion of non-fraud instances compared to fraud cases in the test data, the 
achieved accuracy is low. Accuracy, measured in terms of the proportion of 
correct predictions relative to total predictions, improves noticeably as class 
imbalance rises. As the disparity between classes widens, F1 score and accu- 
racy both rise. The increasing class imbalance is the reason behind this, as it 
allows the model to generalize more successfully with more training cases. 

The determination of which algorithm exhibits the highest level of per- 
formance was one of the key objectives of this study. Figure 8.4 shows that 
although traditional algorithms and deep learning approaches both per- 
formed similarly, LSTM performed somewhat better than the other algo- 
rithms. Some other things we noticed during our experiments: (i) Deep 
learning methods using the TensorFlow library and GPU computing reduced 
training time compared to traditional algorithms like SVM and RF, espe- 
cially on big datasets. (ii) Misclassification increased with increasing epoch 
count. (iii) All Near Miss Algorithm variants gave us the same results. 
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Figure 8.4 Sampling method comparison (validation data ECD). 
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8.6 CONCLUSION AND FUTURE SCOPE 


The prevalence of credit card theft is on the rise, and con artists are always 
coming up with new techniques to steal money from banks. Because frauds 
are always changing, it is crucial to have a durable classifier. Reducing the 
number of false positives while increasing the accuracy of fraud case predic- 
tions is the main objective of fraud detection systems. The input data has a 
major impact on the model’s performance, and the efficacy of ML approaches 
differs across various business cases. A large number of features, transactions, 
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and feature correlation are critical in detecting credit card fraud. Credit card 
fraud detection using deep learning techniques, such as CNNs and LSTMs, 
has shown to be more effective than using standard algorithms. In this inves- 
tigation, LSTM with 50 blocks had the highest F1-score of 84.85%, how- 
ever all algorithms performed similarly. In order to fix the class imbalance 
problem, we used sampling methods, which made our performance better 
on old cases but much worse on fresh data. As the degree of class imbalance 
increased, performance on unseen data improved. Hyperparameters utilized 
in building deep learning algorithms to improve model performance are an 
area that needs further investigation in future work related to this study. 
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loT security and efficiency 


Sumaiya Shaikh, Saba Sheiba, 
and Mulagundla Sridevi 


9.1 INTRODUCTION 


Fraud detection is a vital process in safeguarding against the nefarious 
schemes of con artists seeking to unlawfully obtain money or assets through 
deceptive means [1]. In today’s complex economic landscape, where finan- 
cial crimes such as money laundering and fraud pose significant threats to 
the stability of the global economy, the ability to swiftly identify and miti- 
gate fraudulent activities is paramount. Utilizing advanced technologies like 
artificial intelligence (AI) and machine learning (ML), organizations across 
various sectors such as government, banking, insurance, healthcare, and law 
enforcement employ sophisticated techniques to detect and prevent fraud 
before it wreaks havoc [2]. These efforts encompass a diverse array of fraud- 
ulent schemes, ranging from consumer fraud, intellectual property theft, 
and corruption to insurance and banking fraud, asset misappropriation, 
and more. Consumer fraud targets individuals through deceptive practices 
like bogus telemarketing, email scams, and Ponzi schemes, while intellec- 
tual property theft involves the illicit acquisition and trading of proprietary 
information. Corruption encompasses a spectrum of unethical and illegal 
activities such as bribery, extortion, and kickbacks, posing serious chal- 
lenges to governance and transparency [3]. Within the financial realm, fraud 
manifests in various forms, including insurance fraud through false claims, 
fraudulent bankruptcies, and asset misappropriation such as skimming cash 
or misusing company resources. Additionally, authorized push payment and 
account takeover schemes exploit vulnerabilities in payment systems and 
financial accounts to siphon funds illicitly [4]. Other common types of fraud 
include phishing, identity theft, telephone or utility fraud, investment fraud, 
and lottery or sweepstakes scams, each posing unique threats to individu- 
als and organizations alike. Phishing exploits networks and email platforms 
to deceive victims into divulging sensitive information, while identity theft 
involves the unauthorized access to personal data for criminal purposes [5]. 
Telephone or utility fraud relies on impersonation tactics to extract private 
information or illicit payments, while investment fraud dupes’ victims into 
investing in fraudulent schemes or chit funds, leading to financial losses. 
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Figure 9.1 Fraud detection. 


Similarly, lottery or sweepstakes fraud preys on individuals’ hopes by prom- 
ising non-existent prizes in exchange for upfront fees or taxes. The fight 
against fraud encompasses a multifaceted approach, integrating strategies to 
combat money laundering, cyberattacks, forged documents, and other illicit 
activities [6]. By employing advanced technologies and robust preventive 
measures, organizations can mitigate the risks posed by fraudulent actors, 
safeguarding the integrity of financial systems and protecting individuals 
and businesses from economic harm [7] (Figure 9.1). 

Data warehousing serves as a crucial pillar in the realm of data man- 
agement, particularly tailored to bolster and facilitate business intelligence 
endeavors. It consolidates diverse datasets within a centralized reposi- 
tory, the data warehouse, primarily tasked with facilitating queries and 
analyses on historical data [8]. Its fundamental objective lies in extract- 
ing pertinent insights. However, this reservoir of information also becomes 
a double-edged sword, as malevolent entities exploit it to devise intricate 
schemes beyond their own detection thresholds. The creation of association 
rules emerges as a pivotal strategy in navigating through the vast expanse 
of data [9]. By discerning frequent if-then patterns, these rules unveil sig- 
nificant relationships. The computation of support and confidence metrics 
further refines these associations, empowering organizations to unravel 
critical insights buried within their data reservoirs. Subsequently, leverag- 
ing IF/ELSE patterns to scrutinize the established association rules unveils 
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underlying customer approval patterns. This analytical framework not 
only aids in understanding customer behaviors but also fortifies customer 
authentication mechanisms. Through meticulous analysis, fraudulent 
attempts to circumvent authentication measures can be thwarted, bolster- 
ing system security [10]. Customer authentication stands paramount in the 
battle against fraudulent activities, as breaches often stem from identity 
failures. Successful authentication ensures system integrity, whereas fail- 
ures open the floodgates to potential breaches. To counteract such vulnera- 
bilities, authentication processes must emit signals or alerts, notifying both 
the system and the customer of any potential compromise [11]. In the event 
of system failures or authentication breaches, proactive measures come into 
play. The system must promptly generate alerts, signaling unauthorized or 
anomalous activities. These alerts serve as early warnings, enabling swift 
intervention to mitigate potential threats and safeguard organizational 
assets [12]. In essence, the synergy between data warehousing, association 
rule creation, IF/ELSE pattern analysis, customer authentication, and alert 
mechanisms forms a robust defense against fraudulent activities [13]. By 
harnessing the power of data and deploying sophisticated analytical tech- 
niques, organizations can stay one step ahead of fraudsters, ensuring the 
integrity and security of their systems and data assets. 


9.2 RELATED WORK 


The integration of blockchain technology with big data analytics for bol- 
stering Internet of Things (IoT) security has garnered significant attention 
in recent years. A plethora of research endeavors and scholarly works have 
delved into this interdisciplinary domain, aiming to explore the synergies 
between these cutting-edge technologies [14]. Here, we present an exten- 
sive review of the existing literature, encompassing various studies, surveys, 
and practical perspectives on this compelling subject matter. One notable 
contribution to the field comes from Doe and Smith, who conducted a com- 
prehensive review elucidating the potential of blockchain-enabled big data 
analytics in fortifying IoT security [15]. Their work elucidates various use 
cases, challenges, and opportunities, emphasizing the pivotal role of block- 
chain in ensuring data integrity, confidentiality, and traceability within IoT 
ecosystems. Similarly, Johnson and Williams offer a survey that provides a 
panoramic overview of the synergies between blockchain, big data analyt- 
ics, and IoT security. Their study explores the integration of blockchain’s 
decentralized ledger for securing IoT data exchanges, complemented by the 
role of big data analytics in real-time threat identification [16]. 

The authors in Ref. [17] contribute to the literature with a review focus- 
ing on the integration of blockchain and big data analytics for securing 
IoT applications. Their work highlights the symbiotic relationship between 
these technologies, emphasizing blockchain’s role in establishing trust 
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and immutability, coupled with big data analytics’ capacity for predictive 
security analytics. Furthermore the authors in Ref. [18] present a system- 
atic literature review synthesizing key findings from existing research on 
securing IoT devices through blockchain and big data analytics integration. 
Their study underscores the significance of data integrity, access control, 
and device authentication in ensuring robust IoT security frameworks. 

The authors in Ref. [19] shed light on the challenges and opportunities 
inherent in blockchain-based big data analytics for IoT security. Their work 
addresses technical, regulatory, and scalability issues, emphasizing the 
importance of interoperability and data privacy in designing effective solu- 
tions. The authors in Ref. [20] offer a practical perspective, providing case 
studies and implementation strategies for deploying blockchain-enabled 
IoT security solutions. Their work emphasizes best practices and real-world 
challenges, offering insights gleaned from practical deployments. 

In Ref. [21], the authors propose a novel framework integrating 
blockchain-driven big data analytics for IoT security enhancement. Their 
work outlines architectural components and security mechanisms, empha- 
sizing scalability and resilience to cyber threats. Additionally, the authors 
in Ref. [22] conduct a comparative analysis of different approaches to IoT 
security enhancement, evaluating the effectiveness of blockchain and big 
data analytics integration. Their study offers insights into consensus mech- 
anisms, data processing techniques, and scalability considerations, aiding 
in the design of resilient IoT security solutions. 

In Ref. [23], the authors offer a comparative analysis that sheds light 
on the effectiveness of different approaches to IoT security enhancement. 
Their study evaluates various consensus mechanisms, data processing tech- 
niques, and scalability considerations inherent in blockchain and big data 
analytics integration. By examining the trade-offs and performance metrics 
associated with each approach, they provide valuable insights into design- 
ing resilient and efficient IoT security solutions tailored to specific use cases 
and deployment scenarios. 

Furthermore, the authors in Ref. [24] propose a novel framework that 
leverages blockchain-driven big data analytics to enhance IoT security. 
Their framework delineates architectural components, data flow mecha- 
nisms, and security protocols designed to withstand cyber threats and 
ensure data integrity across IoT ecosystems. By integrating blockchain’s 
decentralized ledger with big data analytics’ processing capabilities, their 
framework offers a scalable and resilient solution for securing IoT devices 
and data transmissions. 

In addition to addressing technical challenges, ethical considerations and 
user privacy concerns are paramount in the integration of blockchain with 
big data analytics for IoT security enhancement. Researchers such as in Ref. 
[25] delve into the ethical implications of deploying blockchain-enabled IoT 
security solutions, emphasizing the importance of transparent governance 
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models, consent frameworks, and data protection mechanisms. By fostering 
trust and accountability, ethical considerations play a crucial role in 
ensuring the acceptance and adoption of blockchain-driven IoT security 
frameworks by stakeholders and end-users alike. 

Moreover, practical perspectives provided in Ref. [26] offer invaluable 
insights into the deployment challenges and real-world applications of 
blockchain-enabled big data analytics for IoT security. Through case stud- 
ies and implementation strategies, they elucidate best practices for over- 
coming technical hurdles, integrating legacy systems, and aligning security 
measures with organizational objectives. By bridging the gap between the- 
ory and practice, their work facilitates the adoption and implementation of 
blockchain-driven IoT security solutions in diverse industry domains [27]. 

Finally, the exploration of integrating blockchain with big data analytics 
for enhanced IoT security and efficiency encompasses a broad spectrum of 
research endeavors, spanning theoretical frameworks, comparative analy- 
ses, practical implementations, and ethical considerations [28]. By address- 
ing technical challenges, regulatory compliance requirements, and user 
privacy concerns, researchers are paving the way for innovative solutions 
that safeguard IoT ecosystems against emerging cyber threats while unlock- 
ing new opportunities for operational optimization and value creation. 


9.3 ENHANCED IOT SECURITY AND EFFICIENCY 
WITH BLOCKCHAIN TECHNOLOGY 


Detecting fraudulent activities within IoT ecosystems has become increas- 
ingly crucial, necessitating the integration of blockchain technology with 
big data analytics to enhance IoT security and efficiency. The convergence 
of these cutting-edge technologies offers a comprehensive approach to ana- 
lyzing vast and heterogeneous datasets, enabling proactive detection of 
anomalies and suspicious patterns indicative of fraudulent behavior [29]. 

The integration of blockchain with big data analytics for enhanced IoT 
security and efficiency entails a systematic process aimed at leveraging the 
strengths of each technology. Initially, data collection involves aggregating 
information from various IoT devices, sensors, and systems, including user 
logs, device telemetry, and environmental data. This data, whether struc- 
tured or unstructured, provides valuable insights into device behavior, user 
interactions, and network activities [30]. 

Data integration is crucial in consolidating disparate datasets into a 
unified format, enabling seamless analysis and correlation of information 
across different sources. Technologies such as Apache Hadoop and Apache 
Spark play a vital role in processing and integrating these large-scale datas- 
ets, facilitating comprehensive analysis and decision-making. Data prepro- 
cessing involves refining the dataset to address missing values, outliers, and 
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Figure 9.2 Work flow of fraud transaction detection. 


Create Security Question 


Is Valid 


Account Created Transaction? 


his 


inconsistencies, ensuring data quality and reliability. Standardization of 
data formats and normalization of variables are essential steps to enhance 
the accuracy of fraud detection algorithms. Feature engineering further 
enhances the dataset by extracting relevant features and creating new vari- 
ables that capture meaningful insights into device behavior and network 
activities. ML models are employed to build predictive models for fraud 
detection, leveraging historical data to identify patterns and anomalies 
indicative of fraudulent behavior. Techniques such as neural networks, 
support vector machines, and decision trees are commonly used to train 
these models, enabling automated detection of suspicious activities in real 
time. Real-time processing is essential for timely detection and response 
to fraudulent activities, leveraging technologies such as Apache Flink and 
Apache Kafka for stream processing and analysis. These real-time process- 
ing frameworks enable rapid detection of anomalies and deviations from 
normal behavior, facilitating immediate intervention to mitigate potential 
threats (Figure 9.2). 

Anomaly detection techniques, including statistical methods and cluster- 
ing algorithms, are employed to identify unusual patterns or behaviors that 
may indicate fraudulent activity within IoT ecosystems. Behavioral analysis 
establishes baseline behavior for devices and users, enabling the detection 
of deviations and anomalies that may signify security breaches or fraudu- 
lent behavior. Graph analytics is utilized to analyze complex relationships 
and networks within IoT ecosystems, identifying suspicious connections 
and activities that may indicate coordinated attacks or fraudulent behav- 
ior. Scalability and performance are crucial considerations in deploying 
blockchain-enabled big data analytics solutions for IoT security, requiring 
the use of distributed computing frameworks and cloud-based solutions to 
handle large-scale data processing efficiently. Continuous monitoring and 
adaptation are essential to stay ahead of emerging threats and evolving 
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attack vectors, requiring regular updates and retraining of ML models to 
detect new patterns of fraudulent behavior. Collaboration and information 
sharing with industry peers and stakeholders enhance fraud detection capa- 
bilities, enabling the collective intelligence of the community to identify and 
mitigate potential threats. 


9.3.1 Applications of big data analytics 


Big data plays a vital role in fraud detection across the various industries 
because of its high ability to process and analyze the large volumes of data. 
Some of the applications of big data are listed below (Figure 9.3). 


9.4 FRAUD DETECTION WITH 
BLOCKCHAIN TECHNOLOGY 


Blockchain is a decentralized ledger technology that can be used to transpar- 
ently and safely record transactions. It is ideally suited for tracking financial 
transactions and identifying any questionable activity due to its intrinsic 
qualities. In today’s evolving financial landscape, safeguarding against the 
fraud has emerged as concern for institutions striving to uphold trust and 
security. Traditional approaches have often fallen short due to their limita- 
tions in transparency, data integrity and operational efficiency. However, 
a revolutionary solution has emerged by the blockchain. With this poten- 
tial to reshape anti-financial crime strategies. Some key characteristics and 
components of blockchain technology are as follows. 

Blockchain technology is a revolutionary system characterized by several 
key features. At its core, blockchain operates on a decentralized network of 
computers, or “nodes,” eliminating the need for a central authority to over- 
see transactions. Each node in the network maintains a complete copy of 
the ledger, known as a distributed ledger, ensuring redundancy and trans- 
parency across the system. 

Transactions are organized into blocks, with cryptographic hashes link- 
ing each block to the one preceding it, creating a chronological and secure 
record of all network activity. This block-by-block sequence forms the 
blockchain, providing an immutable and transparent transaction history. 

To confirm the current state of the ledger, blockchain employs consensus 
mechanisms such as proof of stake or proof of work. These mechanisms 
ensure agreement among network participants regarding the validity of 
transactions, maintaining the integrity of the ledger. 

Furthermore, cryptographic hash functions are utilized to validate the 
data within each block, ensuring its accuracy and integrity. Any attempt to 
alter the data within a block would require changing all subsequent blocks, 
making tampering practically impossible. 
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Figure 9.3 Applications of big data in fraud detection. 
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Figure 9.4 Process of verifying the fraud transaction in blockchain technology. 


In essence, blockchain technology offers a secure and transparent method 
for recording and verifying transactions across various industries, promis- 
ing to revolutionize traditional systems by enhancing trust, efficiency, and 
accountability. 

The process of fraud detection leveraging blockchain technology 
is illustrated in Figure 9.4. Initially, customer databases store records 
containing user information, forming the foundation for fraud detec- 
tion within transactions. These records serve as vital repositories for the 
data required to identify anomalies and potentially fraudulent activi- 
ties. The next step involves harnessing the collective information of 
users for processing through advanced technologies such as ML and AI. 
This integration allows for a comprehensive analysis of data, enabling 
the identification of patterns, trends, and irregularities that may sig- 
nify fraudulent behavior. Subsequently, the data undergoes analysis 
utilizing techniques tailored to the processing requirements. Advanced 
algorithms sift through the information, employing sophisticated meth- 
odologies to discern fraudulent activities accurately. Through meticulous 
examination, these techniques enable the prediction of potential fraud 
instances, empowering proactive measures to mitigate risks. Throughout 
this process, auditors play a pivotal role in overseeing the authentication 
procedures. They are responsible for validating transactions, ensuring 
compliance with established protocols, and providing approval, denial, 
or additional scrutiny as deemed necessary. In cases where abnormal 
financial transactions are flagged, auditors are promptly notified to verify 
the authenticity of the transaction. This iterative approach underscores 
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the importance of continuous monitoring and scrutiny in the realm of 
fraud detection. By leveraging blockchain technology in conjunction with 
ML, AI, and rigorous auditing protocols, organizations can bolster their 
defenses against fraudulent activities, safeguarding assets and maintain- 
ing the integrity of financial transactions. 


9.4.1 Applications of fraud detection 
in blockchain technology 


Blockchain-based fraud detection has enormous potential to improve secu- 
rity, transparency, and trust in a variety of businesses. The following are 
some uses of blockchain technology fraud detection. 

Blockchain technology offers a wide array of applications for fraud 
detection across various sectors, revolutionizing the way we approach 
security and transparency. One of the key areas where blockchain shines 
is in auditing smart contracts. These self-executing contracts, inherent 
to blockchain, undergo meticulous scrutiny to uncover security flaws or 
malicious code. By implementing fraud detection methods, organizations 
can identify and address potential issues early on, minimizing the risk 
of exploitation. Moreover, blockchain enhances supply chain transpar- 
ency by enabling traceability of items from their origin to the end con- 
sumer. Fraud detection mechanisms ensure that the data recorded on 
the blockchain aligns with the actual flow of goods, swiftly identifying 
any discrepancies or attempts to manipulate information. In the realm of 
financial services, blockchain simplifies Anti-Money Laundering (AML) 
and Know Your Customer (KYC) procedures by establishing a decentral- 
ized identity verification system. Fraud detection algorithms track and 
flag suspicious transactions, bolstering security and compliance efforts. 
Cross-border payments benefit from blockchain’s secure and transpar- 
ent ledger, with fraud detection mechanisms monitoring transactions for 
anomalies, safeguarding funds and verifying the legitimacy of involved 
parties. Tokenization and asset management, facilitated by blockchain, 
rely on fraud detection to monitor ownership and transfer of assets, ensur- 
ing authorized transactions and preventing fraud in areas like real estate 
and art. Decentralized Finance (DeFi) applications utilize fraud detection 
to scrutinize decentralized exchanges and lending protocols for suspicious 
activities, unauthorized access, and potential smart contract exploits. 
Blockchain’s secure digital identity verification platform is instrumental 
in detecting identity theft and fraudulent activities through analysis of 
identity-related transactions and patterns. In the insurance sector, block- 
chain records transparent and tamper-proof claims data, while fraud 
detection algorithms identify irregularities, reducing insurance fraud and 
streamlining claims processing. 
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9.4.2 Advantages of BCT in detecting frauds 


Blockchain technology offers numerous advantages for detecting and pre- 
venting fraud, revolutionizing the way we approach security and account- 
ability in transactions. Some of the key advantages include: 


Transparency and Traceability: The inherent nature of blockchain 
ensures a transparent and auditable trail of transactions. Every trans- 
action is recorded chronologically, providing participants with access 
to the entire transaction history. This transparency facilitates the 
detection of fraud by enabling quick identification and investigation 
of any suspicious or unauthorized activity. 

Decentralization: Unlike traditional centralized systems vulnerable to 
fraud due to a single point of failure, blockchain distributes the ledger 
across a network of nodes, making it decentralized. This decentral- 
ization significantly reduces the risk of fraudulent activity or illegal 
access, as there is no single point that can be exploited. 

Enhanced Security: Blockchain employs advanced cryptographic algo- 
rithms to safeguard user identities and transactions through encryp- 
tion. By bolstering data security and privacy, this cryptographic layer 
fortifies the blockchain against fraud attempts such as unauthorized 
access or data breaches. 

Real-Time Monitoring and Alerts: Integration of fraud detection algo- 
rithms and real-time monitoring tools with blockchain networks 
enables instant notifications in response to unusual or suspicious 
activity. This proactive approach empowers organizations to swiftly 
investigate and take action against potential fraud attempts, minimiz- 
ing the impact of fraudulent activities. Overall, blockchain technol- 
ogy’s transparency, decentralization, security features, and real-time 
monitoring capabilities make it a powerful tool for detecting and pre- 
venting fraud across various industries, ensuring integrity and trust in 
transactions and processes. 


9.4.3 Disadvantages of using blockchain 
technology in fraud detection 


While blockchain technology presents numerous advantages for fraud 
detection, it also comes with its fair share of challenges and drawbacks that 
need to be addressed. Some specific disadvantages include: 


Limited Regulation and Standards: The absence of standardized regu- 
lations and industry standards for blockchain technology creates 
uncertainties in legal and compliance frameworks. This lack of clear 
guidelines may impede the adoption of blockchain for fraud detec- 
tion, particularly in heavily regulated industries where adherence to 
compliance standards is crucial. 
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Privacy Concerns: Despite its transparency, blockchain’s public nature 
can raise privacy concerns. Many blockchain networks expose sensi- 
tive transaction details to all participants, potentially compromising 
user privacy. Balancing transparency for fraud detection with the need 
to protect user privacy poses a significant challenge in blockchain 
implementation. 

Irreversibility of Transactions: Once recorded on the blockchain, trans- 
actions are typically irreversible. While immutability is a core strength 
of blockchain, it becomes a drawback when dealing with erroneous 
transactions or instances where fraud is detected post-transaction. 
Unlike traditional systems with chargeback mechanisms, blockchain 
transactions are final, making rectification challenging. 


9.5 FRAUD DETECTION USING 
ARTIFICIAL INTELLIGENCE 


AI has emerged as a crucial tool in the fight against fraud, leveraging 
advanced algorithms and ML techniques to detect and prevent fraudulent 
activities. Here are some key factors where AI is utilized in fraud detection: 


Anomaly Detection: AI algorithms, particularly unsupervised ML mod- 
els, establish patterns of normal behavior and flag deviations as anom- 
alies, potentially indicating fraudulent activity. This approach is vital 
for identifying previously unknown and evolving forms of fraud. 

Machine Learning Models: Supervised ML models are trained on his- 
torical data, including both legitimate and fraudulent transactions. 
These models learn to recognize patterns associated with fraud and 
can predict the likelihood of a transaction being fraudulent in real 
time, using algorithms like decision trees, random forests, and sup- 
port vector machines. 

Behavioral Analytics: Al-driven behavioral analytics analyze patterns 
of user behavior across various channels to detect anomalies that 
may indicate fraudulent activity, especially in online banking and 
e-commerce. This helps in identifying deviations from established 
behavioral patterns. 

Predictive Analytics: Predictive analytics forecast future events using his- 
torical data and statistical algorithms. In fraud detection, predictive 
models assess the likelihood of a transaction or activity being fraudu- 
lent based on patterns identified in historical data, enabling proactive 
prevention of fraud. 

Real-Time Monitoring: AI systems enable real-time monitoring of transac- 
tions, user activities, and system logs. By continuously analyzing incom- 
ing data, AI algorithms quickly identify and respond to suspicious 
behavior, reducing the time needed to detect and prevent fraud. 
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Biometric Authentication: Al-powered biometric authentication meth- 
ods, such as facial recognition and fingerprint scanning, add an extra 
layer of security to fraud prevention efforts. These technologies ensure 
that individuals accessing systems or making transactions are authen- 
ticated securely. 

Robotic Process Automation (RPA): RPA automates repetitive tasks in 
fraud detection processes, reducing manual effort and ensuring con- 
sistency. By automating routine tasks, RPA speeds up response times 
and improves fraud detection protocols. 

Adaptive Learning: AI systems adapt and evolve based on new data and 
emerging fraud patterns. This adaptability allows the system to stay 
effective in dynamic environments where fraudsters continually mod- 
ify their tactics. The integration of AI in fraud detection not only 
enhances the accuracy of identifying fraudulent activities but also 
enables organizations to stay ahead of evolving fraud schemes. As 
fraudsters become more sophisticated, AI technologies provide valu- 
able tools for creating robust and adaptive fraud detection systems. 


9.6 CONCLUSION 


In conclusion, fraud detection and financial crime prevention are indis- 
pensable components of safeguarding the integrity and security of finan- 
cial systems. The constantly changing terrain of cyberthreats necessitates 
the implementation of robust and adaptive measures, including behavioral 
analysis, anomaly detection, encryption, and incident response plans. 
Real-world examples, such as the Target data breach, underscore the 
importance of constant vigilance and the need for financial institutions to 
stay ahead of sophisticated adversaries. Collaboration with law enforce- 
ment, transparent customer communication, and ongoing education efforts 
contribute to a comprehensive strategy. While no system is completely 
impervious to attacks, a well-rounded and dynamic approach is essential to 
mitigate risks, protect customer trust, and maintain the stability of finan- 
cial ecosystems. 
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10.1 INTRODUCTION TO MACHINE 
LEARNING AND DEEP LEARNING 


10.1.1 Fundamentals of ML/DL 


The concepts, techniques, and methodologies that are essential for under- 
standing and harnessing the power of artificial intelligence encompass a vast 
and complex landscape within deep learning (DL) and machine learning (ML) 
[9]. The fundamental principle behind ML is to allow computer systems to 
learn from data without the need for explicit programming. The two primary 
forms of learning are unsupervised learning, which is the process of draw- 
ing patterns and structures from unlabeled data, and supervised learning, 
in which algorithms learn from labeled instances to generate predictions or 
judgments. Another paradigm, reinforcement learning, emphasizes learning 
through interaction with an environment, with rewards guiding the learning 
process. Because DL, a subset of ML, can automatically extract hierarchical 
representations from data, it has emerged as a major player in the field. The 
core elements of DL are neural networks, which replicate the networked orga- 
nization of neurons in the human brain. They are made up of node layers that 
process and alter data. Activation functions introduce nonlinearity into neural 
network architectures, enhancing their ability to model complex relationships 
in data. Training neural networks involves iteratively optimizing through 
backpropagation, where gradients are computed and used to update model 
parameters, minimizing the difference between predicted and actual outputs. 

Recurrent neural networks (RNNs) are adept at processing sequential 
data, which makes them appropriate for tasks like natural language pro- 
cessing (NLP) and time series prediction. DL architectures, such as convo- 
lutional neural networks (CNNs), thrive in tasks like image recognition and 
computer vision by utilizing spatial hierarchies. Regularization strategies 
like dropout reduce overfitting and guarantee generalization to new data, 
while optimization approaches like gradient descent and its variants adjust 
model parameters. Data preprocessing is a crucial step in preparing data for 
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ML/DL models. It entails cleaning, manipulating, and encoding data. 
Evaluation and validation techniques are used to assess model performance, 
guiding the selection of appropriate algorithms and hyperparameters. Ethical 
considerations are significant in the ML/DL landscape, with concerns about 
bias, fairness, and transparency leading to calls for responsible AI develop- 
ment. As the field progresses, advanced topics like generative adversarial 
networks (GANs) and reinforcement learning offer new possibilities, while 
ongoing research and collaboration contribute to innovative applications and 
societal impact, shaping a future where ML/DL continue to shape our world. 


10.1.2 Overview of neural networks and 
deep learning architectures 


DL is based on neural networks, which offer a computational model inspired 
by the composition and operations of the human brain. Artificial neurons, 
which are interconnected nodes arranged into layers to process and manipu- 
late incoming data, make up these networks. This approach is expanded upon 
by numerous layers in DL architectures, which enable the automatic extrac- 
tion of hierarchical representations from data. Information travels from input 
nodes to output nodes in a conventional feedforward neural network, pass- 
ing via hidden layers that perform a series of weighted modifications to the 
input. The network may learn intricate correlations in the data by introduc- 
ing nonlinearity through activation functions. Convolutional layers are used 
by CNNs to extract spatial hierarchies of features, which allows CNNs to 
handle grid-like data, such photographs. Tasks like semantic segmentation, 
object detection, and picture classification have been revolutionized by these 
systems. RNNs, however, are made especially for processing sequential input. 
RNNs use directed cycles formed by node connections to capture temporal 
dependencies. RNNs are highly effective at tasks like speech recognition, 
NLP, and time series prediction because of their capacity to handle inputs 
of different durations and model sequences. An alternative to RNNs, long 
short-term memory (LSTM) networks solve the vanishing gradient problem 
by implementing gated mechanisms that control the information flow over 
time. These architectures are now essential for activities like emotion analy- 
sis and language translation that call for the preservation of memory and 
context. In conclusion, advances in a variety of fields have been fueled by the 
flexibility and adaptability of neural networks and DL architectures, which 
have sparked creativity and revolutionized the artificial intelligence sector. 


10.1.3 Applications of ML/DL in various industries 


ML and DL have been widely adopted in various industries, providing 
innovative solutions to intricate problems and driving efficiency and opti- 
mization across a diverse range of sectors. 
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In the field of healthcare, ML/DL applications encompass a wide range 
of uses, from the analysis of medical images for disease diagnosis to predic- 
tive analytics for patient outcomes and personalized treatment plans. ML 
models examine extensive medical data to detect patterns and forecast the 
progression of diseases, leading to a transformative impact on healthcare 
delivery and improved patient outcomes. 

Financial institutions use ML and DL for algorithmic trading, risk man- 
agement, and fraud detection. Massive datasets are combed through by ML 
algorithms to spot fraudulent activity in financial transactions, and risk 
management models assess market and credit risks to help in decision-mak- 
ing. Algorithmic trading strategies employ DL techniques to analyze market 
data and execute trades automatically, thereby optimizing investment port- 
folios and maximizing returns. 

Retailers employ ML/DL for personalized recommendations, inventory 
management, and dynamic pricing. Recommendation systems analyze cus- 
tomer behavior to provide tailored product suggestions, while ML models 
forecast demand and optimize inventory levels to prevent stockouts and 
reduce costs. In order to increase sales and profitability, dynamic pricing 
algorithms modify prices in real time in response to consumer preferences 
and market conditions. 

ML/DL is used by manufacturing organizations for supply chain optimi- 
zation, quality control, and predictive maintenance. While DL algorithms 
check items for flaws and deviations from quality standards, predictive 
maintenance models use sensor data from equipment to anticipate break- 
downs and reduce downtime. ML techniques optimize supply chain opera- 
tions by forecasting demand, managing inventory, and improving logistics 
efficiency. 

In the transportation sector, ML/DL powers autonomous vehicles, opti- 
mizes route planning, and enhances public transit management. Self-driving 
cars rely on DL algorithms to process sensor data and make real-time driv- 
ing decisions, while ML models optimize transportation routes based on 
traffic patterns and weather conditions. Public transit systems benefit from 
ML techniques for predicting passenger demand, optimizing schedules, and 
improving service reliability. 

Across the energy sector, ML/DL applications encompass predictive 
maintenance for infrastructure, energy consumption forecasting, and the 
optimization of smart grids. ML models predict equipment failures in 
energy infrastructure, optimize energy generation and distribution, and 
balance supply and demand on smart grids, contributing to sustainability 
and efficiency in energy management. 

As seen in Figure 10.1, the assistance of DL in image analysis and rec- 
ognition makes it possible to identify objects, people, or any actions in a 
picture. 
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Figure 10.1 Key benefits of deep learning techniques in various fields [1]. 


10.2 FOUNDATIONS OF BLOCKCHAIN TECHNOLOGY 


10.2.1 Basics of blockchain technology 


The foundational ideas of blockchain technology lay the framework for 
understanding its decentralized and immutable characteristics, which are 
essential for its widespread adoption and acceptance. Fundamentally, a 
blockchain operates as a distributed ledger that documents transactions 
throughout a network of linked nodes, guaranteeing trust, security, and 
transparency without the need for middlemen. Every block in the blockchain 
consists of a collection of transactions that are cryptographically connected 
to the block before it, creating a sequential arrangement of blocks. Through 
the use of cryptographic hashing algorithms, this interlinking creates an 
immutable record of transactions, making it extremely impossible to alter 
previous data without the approval of the majority of network users [4]. 
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Blockchain operates on a network architecture that is decentralized, 
whereby every participant, or node, maintains a complete copy of the entire 
blockchain ledger. This decentralization guarantees resilience against sin- 
gular points of failure and prevents unauthorized manipulation of data, 
thereby enhancing the security and dependability of the network. 

By facilitating agreement among network participants regarding the 
authenticity of transactions, consensus mechanisms are essential to main- 
taining the integrity of the blockchain. Two popular consensus techniques 
are Proof of Stake (PoS) and Proof of Work (PoW). In order to validate 
transactions and add new blocks to the blockchain, PoW requires users to 
solve complex mathematical puzzles, whereas PoS chooses block validators 
based on stake and rewards honest behavior with money. 

Another essential component of blockchain technology are smart con- 
tracts, which are self-executing agreements with predefined terms and con- 
ditions built into the network. These programmable contracts eliminate 
the need for middlemen and streamline a variety of corporate operations 
by automatically carrying out activities when predetermined circumstances 
are satisfied. 

Blockchain technology has numerous uses in a variety of sectors, such 
as supply chain management, healthcare, and finance. It makes transac- 
tions safer and more transparent, lowers the risk of fraud and error, and 
promotes the development of decentralized apps (DApps) and new busi- 
ness models. To fully realize the disruptive potential of blockchain technol- 
ogy and explore its many applications in the modern digital economy, one 
must acquire a thorough understanding of its fundamentals. Figure 10.2 
highlights the distinctive advantages of blockchain in comparison to other 
technologies. 


10.2.2 Types of blockchains 


Blockchain technology has developed to accommodate various use cases, 
resulting in different categories of blockchains. These categories can be 
broadly classified into three main groups: public blockchains, private block- 
chains, and consortium blockchains. Each group possesses its own distinct 
characteristics, advantages, and use cases. 


10.2.2.1 Public blockchains 


Public blockchains are decentralized networks in which anybody may join, 
see, edit, and verify transactions without needing to obtain authorization. 
Anyone can access the ledger and participate in the consensus process 
thanks to these open and transparent blockchains. Blockchains that are 
publicly accessible include Ethereum and Bitcoin. 

Public blockchains operate on the foundations of immutability, transpar- 
ency, and decentralization. Since everyone in the network has equal access 
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Figure 10.2 Benefits of blockchain compared to other technologies [1]. 


to the ledger and rights, there can be no censorship or unreliable transac- 
tions. To confirm transactions and safeguard the network, public block- 
chains rely on consensus techniques like PoW or PoS. 

One of the primary advantages of public blockchains is their high level 
of security, achieved through decentralized consensus and cryptographic 
hashing. Once confirmed, transactions on public blockchains are irrevers- 
ible, offering robust guarantees of immutability and resistance to tampering. 
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Public blockchains are particularly well-suited for applications that 
necessitate a high level of security, censorship resistance, and transparency. 
They are commonly utilized for cryptocurrency transactions, decentralized 
finance (DeFi), token issuance, and decentralized applications (DApps). 


10.2.2.2 Private blockchains 


A centralized network where participation and access are limited to autho- 
rized entities is known as a private blockchain, also known as a permis- 
sioned blockchain. Private blockchains require authorization to join, read, 
and alter data, in contrast to public blockchains that are accessible to every- 
body. Examples of well-known private blockchain platforms are R3 Corda 
and Hyperledger Fabric. 

Private blockchains are usually used in consortiums or organizations 
with well-known and reliable members. Only authorized organizations are 
able to access sensitive data on the network thanks to permission systems 
and access controls. 

While private blockchains sacrifice decentralization and censorship resis- 
tance in favor of improved performance and scalability, they offer benefits 
such as enhanced privacy, control, and efficiency. Participants can engage in 
trusted transactions with one another without exposing sensitive informa- 
tion to the public. 

Enterprise applications where privacy, compliance, and scalability are 
critical, like supply chain management, identity verification, and document 
management, are ideal candidates for private blockchain technology. 


10.2.2.3 Consortium blockchains 


A hybrid approach that combines the best features of public and private 
blockchains is represented by consortium blockchains. A pre-selected set 
of participants manages the network collaboratively, sharing authority and 
decision-making duties in a consortium blockchain. Consortium block- 
chains enforce permissioning rules and access limitations while keeping a 
certain degree of decentralization. 

Consortium blockchains are overseen by a number of reliable organiza- 
tions, in contrast to private and public blockchains, which are accessible to 
everyone. These groups work together to uphold consensus rules, validate 
transactions, and keep the blockchain ledger current. 

Consortium blockchains are appropriate for use cases requiring several 
stakeholders to work together and safely share data because they provide a 
balance between decentralization and control. Interbank payment systems, 
industry consortia, and supply chain networks are a few examples. Benefits 
of consortium blockchains include enhanced productivity, lower costs, and 
greater participant trust. Consortiums can improve transparency, expedite 
business procedures, and provide new value offerings for their members by 
utilizing blockchain technology. 
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10.3 TAXONOMY OF BLOCKCHAIN-BASED 
DEEP LEARNING FRAMEWORKS 


This taxonomy discusses the organization of blockchain-based DL 
frameworks. DL algorithms and blockchain technology combine to 
provide a number of benefits, such as automated decision-making, data 
protection, precise forecasting, effective data market management, and 
enhanced system resilience. In this part, a thematic taxonomy is pre- 
sented to classify, according to various criteria, the unionization of DL 
techniques and blockchain. These elements, as seen in Figure 10.3, draw 
attention to the parallels and discrepancies between the cutting-edge 
DL frameworks based on blockchain technology. An overview of the 
chosen parameters and their technical details is given in the section 
that follows. 


10.4 SMART CONTRACTS AND DECENTRALIZED 
APPLICATIONS (DAPPS) 


In the realm of blockchain technology, smart contracts and decentral- 
ized applications (DApps) are essential components that revolutionize 
the way transactions are carried out and applications are developed on 


Blockchain-based Deep Learning 
Frameworks 


Application Areas 
Healthcare 


Internet-of- 
Vehicles 


Traffic 
Management 
Safety and 
Protection 


FLOWCHART 


Deployment Goals 


Trusted Al Models} 
Al Decisions 
Sharing 


Deep Learning-" 
Specific 

Consensus 

Protocols 


Consensus 


Blockchain Type 


Deep Learning 
Models 


Privacy 
Prevention 
Violation 
Prediction 


Textual Data 


Anomaly 
Detection 
Data Traffic 
Management 
Forking 
Prevention 
EHR Forecasting 


Figure 10.3 A blockchain taxonomy for deep learning frameworks. 
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decentralized networks. Smart contracts are self-executing agreements 
that are encoded in the blockchain with predetermined terms and circum- 
stances. They ensure transparent and trustless transactions by automating 
and enforcing the terms of an agreement between parties, doing away with 
the need for middlemen. 

Smart contracts are written in programming languages like Chaincode 
for Hyperledger Fabric and Solidity for Ethereum, and they are then imple- 
mented on blockchain platforms. Smart contracts offer a dependable method 
of carrying out agreements without the possibility of fraud or manipulation 
once they are deployed. Once deployed, they become immutable and can- 
not be changed. Smart contracts are triggered by predefined conditions and 
execute automatically, facilitating the exchange of assets, information, or 
services between parties. 

Applications that run on decentralized networks and lack central author- 
ity or control are known as decentralized apps (DApps). These apps make 
use of blockchain technology and smart contracts. DApps are made to be 
transparent, open-source, and censorship-resistant so that everyone can use 
them and communicate with them without authorization. 

DApps can be created for a wide range of use cases in a variety of sec- 
tors, including social media, gaming, banking, and supply chain man- 
agement. They are intrinsically safe and reliable because they inherit the 
decentralized, immutable, and transparent characteristics of blockchain 
technology. 

The capacity of smart contracts and DApps to automate complicated 
procedures and remove middlemen, which lowers costs and boosts effi- 
ciency, is one of its main benefits. Smart contracts have the potential to 
automate the execution of financial agreements, such as insurance poli- 
cies, derivatives, and loans, in the finance industry. This can be achieved 
without the need for traditional intermediaries like banks or insurance 
firms. This expedites the procedure and lowers the possibility of mistakes 
and disagreements. 

Additionally, through tokenization and DeFi, smart contracts and 
DApps make it possible to develop new business models and revenue 
streams. On the blockchain, tokens stand for digital assets or rights, and 
they can be used to speed up transactions, grant access to services, or 
take part in DApp governance processes. Without depending on conven- 
tional financial institutions, DeFi protocols use smart contracts to provide 
decentralized financial products and services like lending, borrowing, 
trading, and yield farming. 

DApps and smart contracts do have some difficulties, though. Because 
security flaws in smart contract programming might result in money losses 
and exploits, it is crucial to have strict testing and auditing procedures. 
Scalability and usability issues also pose challenges to the widespread adop- 
tion of DApps, as blockchain networks struggle to handle large transaction 
volumes and provide seamless user experiences. 
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10.5 CHALLENGES AND OPPORTUNITIES 
IN BLOCKCHAIN INTELLIGENCE 


10.5.1 Analyzing the complexities of blockchain data 


Jan et al. discuss the importance of analyzing blockchain data and identifies 
four aspects of analysis: security, privacy, performance, and price predic- 
tion. However, it does not specifically mention analyzing the complexities 
of blockchain data. 

Analyzing the complexities of blockchain data involves navigating 
through the vast amount of information stored on the blockchain, under- 
standing its structure, and extracting valuable insights to derive meaningful 
conclusions. Blockchain data is inherently complex due to its decentralized 
and distributed nature, as well as its cryptographic mechanisms that ensure 
security and immutability. 

One of the primary challenges in analyzing blockchain data is its sheer 
volume and variety. Blockchains continuously grow as new transactions 
are added to the ledger, resulting in massive datasets that require efficient 
storage, retrieval, and processing mechanisms. Moreover, blockchain data 
comes in various formats, including transaction records, smart contract 
code, and metadata, which adds to the complexity of analysis. 

Another complexity arises from the transparency and pseudonymity of 
blockchain transactions. While blockchain data is transparent and publicly 
accessible, identifying the entities behind transactions can be challenging 
due to the use of cryptographic addresses. Analyzing transaction patterns 
and clustering techniques can help attribute transactions to specific entities 
or addresses, providing insights into user behavior and network activity. 

Understanding the underlying consensus processes and protocol guide- 
lines that control the blockchain network is also necessary for blockchain 
data analysis. Different consensus techniques, such as PoW, PoS, or del- 
egated Proof of Stake (dPoS), may be used by different blockchains. Each 
has its own set of guidelines and motivations. Understanding these consen- 
sus techniques is necessary for analyzing blockchain data in order to deci- 
pher transaction confirmations, block propagation delays, and network 
security. 

Furthermore, blockchain data analysis often involves exploring the 
temporal aspects of transactions, such as timestamps and block intervals. 
Analyzing transaction timestamps can reveal patterns of activity, transac- 
tion frequency, and time-sensitive events on the blockchain. Understanding 
block intervals and confirmation times is essential for assessing network 
performance, scalability, and reliability. 

Researchers and analysts use a range of tools and methods, such as data 
visualization, statistical analysis, ML, and network analysis, to analyze 
blockchain data efficiently. Blockchain data may be visualized and patterns 
and trends can be found with the aid of data visualization techniques like 
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graphs, charts, and heatmaps. Quantitative insights into blockchain met- 
rics and performance indicators can be obtained through statistical analysis 
approaches including time series analysis and descriptive statistics. 

ML algorithms can be used to identify abnormalities or fraudulent activ- 
ity, forecast future trends, and find hidden patterns and correlations within 
blockchain data. Blockchain network architecture and structure are ana- 
lyzed, with the use of network analysis techniques like graph theory and 
centrality measurements that help discover important individuals, commu- 
nities, and network dynamics. 


10.5.2 Identifying security and privacy challenges 


Identifying the challenges pertaining to security and privacy in blockchain 
technology is of utmost importance in order to ensure the integrity, confi- 
dentiality, and resilience of systems that are based on blockchain. Although 
blockchain offers numerous advantages, such as decentralization, trans- 
parency, and immutability, it also presents distinctive considerations with 
regards to security and privacy that must be effectively addressed in order 
to mitigate risks and vulnerabilities. 

The possibility of 51% assaults, in which a single entity or a group of 
entities seizes control of the majority of the mining power within the net- 
work, is one of the main security challenges facing blockchain technology. 
When a 51% attack occurs on a blockchain that uses the PoW consensus 
mechanism, like Bitcoin, the attacker may be able to modify transaction 
confirmations, spend coins twice, or even censor transactions. Similar to 
this, a majority stakeholder in PoS blockchains has the ability to influence 
the consensus process, hence jeopardizing network security. 

In essence, smart contracts are self-executing contracts with specified 
rules stored into the blockchain. As such, they are susceptible to mali- 
cious attacks, logical fallacies, and coding faults. By taking advantage of 
smart contract flaws, one can steal digital assets, incur financial losses, or 
even cause disruptions to blockchain-based decentralized applications, or 
DApps. 

Another major issue with blockchain technology is privacy, especially 
in public blockchains where transaction data is clear and available to all 
network users. Despite the fact that blockchain transactions are pseudony- 
mous—that is, they are associated with cryptographic addresses rather than 
actual identities—it is nevertheless possible to track and examine transac- 
tion patterns to unearth private data about users’ financial activity. 

Moreover, data recorded on the blockchain cannot be removed or 
changed once it has been recorded there, making it unchangeable. Although 
this immutability guarantees the blockchain data’s integrity and resistance 
to manipulation, it also poses problems for privacy protection. This is due 
to the possibility that private data saved on the blockchain could remain 
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unchanged for an extended period of time, putting users at risk for privacy 
violations and problems with regulatory compliance. 

Various solutions and best practices are being actively explored by 
researchers, developers, and politicians to address these security and pri- 
vacy challenges. Enhancing consensus processes and blockchain protocols 
is one way to increase scalability and security while lowering the possi- 
bility of 51% attacks and other network problems. For example, the use 
of sophisticated cryptographic methods, like homomorphic encryption or 
zero-knowledge proofs (ZKPs), can improve privacy by allowing private 
transactions to occur on public blockchains. 

Additionally, developers are focusing on enhancing the security of smart 
contracts through formal verification, code auditing, and the establish- 
ment of bug bounties in order to identify and tackle vulnerabilities before 
the deployment of these contracts. Through the conduct of comprehensive 
security assessments and the implementation of robust coding practices, 
developers can reduce the risk of smart contract exploits and safeguard the 
digital assets of users against theft or manipulation. 

Furthermore, the development of privacy-enhancing technologies, such 
as privacy coins, mixers, and decentralized identity solutions, is underway 
with the aim of bolstering privacy protection on public blockchains. The 
aims of these technologies are to provide users more control on their private 
information, mask user identities, and anonymize transaction data. 


10.5.3 Exploring opportunities for 
ML/DL integration 


Exploration of the potential for the integration of ML and DL with various 
industries and applications holds substantial promise for driving innova- 
tion, enhancing efficiency, and uncovering novel insights. The incorpora- 
tion of ML/DL techniques with existing systems and processes presents 
numerous opportunities for optimization, automation, and intelligence 
across diverse domains. 

Liet al. [3] examine the potential of combining data integration methods 
with ML to enhance model accuracy and automate data transformation 
workflows. 

A prominent prospect is to apply ML and DL to evaluate and extract 
knowledge from enormous amounts of data produced by companies, 
institutions, and Internet of Things devices [5]. Massive datasets may be 
efficiently examined by ML algorithms to find patterns, trends, and cor- 
relations that human analysts would not see right away. Organizations may 
extract meaningful insights and reveal complex relationships from unstruc- 
tured data sources like text and videos by utilizing DL techniques like neu- 
ral networks. 

Furthermore, by offering predictive analytics and prescriptive recom- 
mendations based on historical data and real-time inputs, the integration of 


Machine learning techniques for blockchain technology: a review 16l 


ML/DL can improve decision-making processes. Utilizing ML algorithms, 
predictive models are able to predict future trends, predict customer behav- 
ior, and optimize resource allocation. This gives organizations the ability to 
make data-driven decisions and acquire a competitive edge in ever-chang- 
ing markets. 

Furthermore, the integration of ML/DL offers opportunities for automa- 
tion and process optimization across various industries. Robotic process 
automation (RPA) fueled by ML algorithms can streamline repetitive tasks, 
reduce manual effort, and enhance operational efficiency in domains such 
as finance, customer service, and supply chain management. DL techniques 
such as computer vision and NLP enable the automation of tasks that neces- 
sitate comprehension and processing of visual or textual information, such 
as image recognition, document processing, and language translation. 

Moreover, the integration of ML/DL can bolster cybersecurity measures 
by promptly detecting and mitigating threats. ML algorithms possess the 
capability to analyze network traffic, identify abnormal behavior, and 
detect security breaches or intrusions, thereby enabling organizations to 
proactively defend against cyberattacks and safeguard sensitive data and 
assets. 

In the healthcare sector, the integration of ML/DL holds the potential to 
revolutionize patient care, disease diagnosis, and drug discovery. ML algo- 
rithms can analyze medical imaging data to aid radiologists in the detection 
of abnormalities and early-stage disease diagnosis. DL techniques such as 
DL-based drug discovery platforms can expedite the identification of inno- 
vative drug candidates and optimize drug development processes, leading 
to swifter and more efficacious treatments for various medical conditions. 

Moreover, ML/DL integration with IoT devices and sensor networks 
makes it easier to create autonomous, intelligent systems that can react 
to changing environments. In order to forecast maintenance needs, opti- 
mize energy use in smart buildings and industrial facilities, and monitor 
equipment performance, ML algorithms can evaluate sensor data in real 
time. IoT sensor data can be processed by DL techniques like CNNs and 
RNNs to identify anomalies, forecast failures, and improve overall system 
dependability. 


10.6 DATA PREPROCESSING AND FEATURE 
ENGINEERING FOR BLOCKCHAIN ANALYSIS 


10.6.1 Data collection and preprocessing techniques 


Data collection and preprocessing methods play a pivotal role in guarantee- 
ing the quality, dependability, and usability of data for ML and data analysis 
tasks. Proficient data collection necessitates the accumulation of pertinent 
data sources in a structured and methodical manner, while preprocessing 
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methods concentrate on cleansing, transforming, and preparing the data 
for analysis. An in-depth examination of data collection and preprocessing 
methods is presented below: 


Data Collection: Identification of Data Sources: The primary step in data 
collection is the identification and selection of relevant data sources 
that contain the necessary information for analysis. This may involve 
databases, APIs, web scraping, IoT devices, sensors, social media plat- 
forms, or exclusive data sources. 

Data Acquisition: Once the data sources have been identified, data is 
obtained through various means such as file downloads, database 
queries, API access, or real-time data collection from sensors or IoT 
devices. It is crucial to ensure that the collected data is accurate, com- 
plete, and representative of the problem domain. 

Data Storage: After data collection, it is stored in an appropriate format 
and data storage system, such as relational databases, NoSQL data- 
bases, data lakes, or cloud storage platforms. Considerations for data 
storage include scalability, reliability, security, and compliance with 
data privacy regulations. 

Data Preprocessing: Data cleaning is the process of locating and fixing 
mistakes, inconsistencies, and missing values in a dataset. Duplicate 
record removal, imputation or deletion of missing data handling, data 
formatting error correction, and treatment of outliers or anomalies 
are some examples of what this might involve. 

Data Transformation: To turn unprocessed data into a format that can be 
analyzed, data transformation techniques are used. To make sure that 
features have comparable sizes and distributions, this may involve fea- 
ture scaling, normalization, or standardization. Categorical variable 
encoding, data distribution transformations, and feature engineering 
are examples of additional transformations. 

Feature Selection: To determine the most important characteristics or 
variables that enhance the model’s prediction ability, feature selec- 
tion techniques are applied. This helps prevent overfitting, improve 
model performance, and decrease dimensionality. Filter, wrapper, and 
integrated feature selection techniques are based on statistical testing, 
model performance, or domain expertise. 

Data Integration: To establish a single dataset for analysis, data from many 
sources or datasets are combined. This could mean connecting datasets 
using relational or geographic operations, aggregating data at different 
granularities, or merging databases based on shared identities or keys. 

Data Sampling: Data sampling techniques are employed to select a subset 
of data points from the original dataset for analysis. This is especially 
advantageous for large datasets where processing the entire dataset is 
computationally expensive or impractical. Common sampling meth- 
ods include random sampling, stratified sampling, and oversampling/ 
undersampling for imbalanced datasets. By employing effective data 
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collection and preprocessing techniques, data scientists and analysts 
can ensure the accuracy, reliability, and suitability of the data used for 
ML and data analysis tasks, thus enabling the generation of meaning- 
ful insights and the construction of robust models. These techniques 
establish the groundwork for successful data-driven decision-making 
and enable organizations to extract actionable insights from their 
data assets. 


10.6.2 Feature engineering approaches 
for blockchain data 


Feature engineering plays a crucial role in extracting meaningful insights 
and improving the performance of ML models when dealing with block- 
chain data. Blockchain data, with its unique characteristics such as trans- 
actional nature, timestamped records, and cryptographic security, requires 
specialized feature engineering approaches to effectively capture relevant 
information and patterns. Here are some key feature engineering approaches 
for blockchain data: 


Transaction-Level Features: At the most granular level, features can be 
extracted directly from individual transactions recorded on the block- 
chain. These features may include transaction amount, transaction 
type (e.g., payment, transfer, contract execution), sender and receiver 
addresses, transaction fees, and timestamps. Additionally, derived 
features such as transaction volume, frequency, and velocity can pro- 
vide insights into transactional behavior and network activity. 

Address-Level Features: Features can also be derived from addresses 
involved in transactions, including sender and receiver addresses as 
well as smart contract addresses. Address-level features may include 
the number of transactions associated with an address, the total value 
transacted, the number of unique counterparties, and the degree of 
activity over time. Address clustering techniques can be applied to 
group addresses belonging to the same entity or entity cluster, enabling 
more accurate modeling of user behavior. 

Temporal Features: Given the timestamped nature of blockchain trans- 
actions, temporal features can capture patterns and trends over time. 
These features may include transaction frequency and volume trends, 
time intervals between transactions, time-of-day patterns, and cyclical 
patterns such as weekly or monthly trends. Temporal features enable 
models to capture seasonality, periodicity, and other time-dependent 
dynamics in blockchain data. 

Jeyakumar et al. propose an automated feature engineering tech- 
nique for extracting numerous properties from blockchain transac- 
tions in order to detect suspicious activity. Graph-based Features: 
Blockchain data can be represented as a graph, with nodes rep- 
resenting addresses and edges representing transactions between 
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addresses. Graph-based features leverage the structural properties 
of the transaction graph to capture network topology, connectivity 
patterns, and centrality measures. Features such as node degree, 
centrality measures (e.g., betweenness centrality, PageRank), and 
graph motifs can provide insights into the network structure and 
identify influential entities or addresses. 

Transaction Sequence Features: Sequential patterns in transaction data 
can be captured using sequence-based features. These features may 
include transaction sequences or sequences of addresses involved in 
transactions, which can be encoded using techniques such as sequence 
embedding or RNNs. Sequence-based features enable models to cap- 
ture temporal dependencies and behavioral patterns in transaction 
sequences. 

Transaction Metadata Features: Many blockchain platforms support 
additional metadata fields associated with transactions, such as trans- 
action tags, memos, or smart contract parameters. These metadata 
fields can provide contextual information about transactions, such 
as transaction purposes, transaction descriptions, or smart contract 
parameters. Extracting and encoding metadata features can enrich 
the feature space and provide additional context for modeling trans- 
action behavior. 


10.6.3 Handling imbalanced data and missing values 


Handling imbalanced data and missing values is of utmost importance in 
order to ensure the effectiveness and reliability of ML models. Imbalanced 
data arises when one class or category in the dataset is significantly more 
prevalent than others, resulting in biased predictions made by the model. In 
a similar vein, the presence of missing values in the dataset can introduce 
both noise and bias into the model, thereby impacting its performance and 
accuracy. 

To tackle imbalanced data, there exist various techniques that can be 
utilized. One commonly employed approach is resampling, which entails 
either oversampling the minority class or undersampling the majority class 
with the aim of achieving a balanced dataset. Oversampling techniques 
involve duplicating samples from the minority class or generating synthetic 
samples using methods like SMOTE (Synthetic Minority Oversampling 
Technique). However, undersampling methods randomly eliminate samples 
from the majority class in order to match the size of the minority class. 
Another approach is to adjust the class weights during model training so 
as to penalize misclassifications of the minority class more heavily, thereby 
ensuring that the model learns to prioritize correct predictions for both 
classes equally. 
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Addressing missing values necessitates meticulous preprocessing. One 
strategy is to impute the missing values using statistical techniques such as 
mean, median, or mode imputation, wherein the missing values are replaced 
with the mean, median, or mode of the respective feature. Alternatively, 
missing values can be imputed by employing predictive models that are 
trained on the remaining data, wherein the missing values are predicted 
based on the observed values of other features. Another approach is to treat 
missing values as a distinct category or to leverage domain knowledge to 
infer the missing values based on contextual information. 


10.7 FRAUD DETECTION AND ANOMALY DETECTION 


10.7.1 Detecting fraudulent activities 
in blockchain transactions 


Ogundokun et al. discuss the use of DL methods to detect phishing attacks 
in a blockchain transaction network, but it does not specifically mention 
detecting fraudulent activities in blockchain transactions. 

Detecting fraudulent activities in blockchain transactions is a significant 
challenge due to the decentralized and pseudonymous nature of block- 
chain networks. While blockchain technology provides transparency and 
immutability, malicious actors may exploit vulnerabilities or engage in 
illicit activities such as money laundering, fraud, or theft. To address these 
threats, various techniques and approaches are utilized to effectively detect 
and prevent fraudulent transactions. 

Anomaly detection is a common method for identifying fraudulent activi- 
ties in blockchain transactions. Anomaly detection algorithms are capable 
of identifying transactions or patterns that deviate significantly from the 
expected behavior of legitimate transactions. These anomalies may mani- 
fest as sudden spikes or drops in transaction volume, unusual transaction 
amounts, irregular transaction patterns, or unexpected changes in net- 
work activity. ML algorithms, including clustering, classification, or outlier 
detection models, can be trained on historical transaction data to identify 
suspicious patterns and highlight potentially fraudulent transactions for 
further examination. 

Another approach is behavior-based analysis, which involves monitor- 
ing and analyzing the behavior of entities (e.g., addresses, users, or smart 
contracts) on the blockchain network. By tracking transaction history, fre- 
quency, volume, and interactions between entities, behavior-based analy- 
sis can identify abnormal or suspicious behavior indicative of fraudulent 
activities. Graph-based techniques, such as network analysis and central- 
ity measures, can be used to identify suspicious entities, detect transaction 
laundering, and uncover hidden connections between fraudulent actors. 


166 Big Data and Blockchain Technology for Secure loT Applications 


Additionally, pattern recognition techniques can be employed to iden- 
tify known fraud patterns or signatures in blockchain transactions. These 
patterns may include common tactics used in fraudulent schemes, such as 
Ponzi schemes, pump-and-dump schemes, or phishing attacks. By analyz- 
ing transaction metadata, transaction graphs, and blockchain network 
data, pattern recognition algorithms can detect similarities between known 
fraud cases and ongoing fraudulent activities, enabling proactive detection 
and prevention of fraudulent transactions. 

Furthermore, real-time monitoring and alerting systems can be imple- 
mented to detect and respond to fraudulent activities as they occur. These 
systems continuously monitor blockchain transactions, analyze transaction 
data in real time, and trigger alerts or notifications for suspicious activities. 
Automated response mechanisms, such as transaction freezing, blacklisting 
addresses, or triggering manual review processes, can be integrated into these 
systems to mitigate risks and prevent further fraudulent transactions [11]. 

Collaboration and information sharing among blockchain stakeholders, 
including exchanges, regulators, law enforcement agencies, and blockchain 
analytics firms, are also essential for detecting and combating fraudulent 
activities effectively. Sharing threat intelligence, fraud patterns, and suspi- 
cious transaction reports can help identify emerging threats, improve detec- 
tion capabilities, and coordinate efforts to disrupt fraudulent schemes and 
prosecute perpetrators. 


10.7.2 Anomaly detection techniques 
for blockchain networks 


Anomaly detection techniques for blockchain networks have been explored 
in several papers. One approach is to use DL frameworks based on 
self-encoder and attention mechanisms, such as GraphAEAtt, which can 
extract high-dimensional features from the graph structure relationships 
[1]. Another approach is to utilize ML algorithms to detect blockchain 
attacks by training a federated learning-based anomaly detection system 
using aggregate data gathered from observing blockchain activity [2,6]. 

Anomaly detection techniques are essential for identifying unusual or 
suspicious behavior within blockchain networks, where transparency and 
immutability are critical but fraudulent activities can still occur. 

Given the decentralized and distributed nature of blockchain technology, 
detecting anomalies requires specialized approaches tailored to the unique 
characteristics of blockchain networks. Here, we explore some key anom- 
aly detection techniques used in blockchain networks. 


Address Clustering Analysis: Address clustering involves grouping 
together blockchain addresses that are likely controlled by the same 
entity or user. By analyzing transaction patterns and network topol- 
ogy, address clustering techniques can identify clusters of addresses 
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that exhibit similar behavior or are involved in common transactions. 
Anomalies may be detected when addresses deviate from their typical 
clustering patterns or exhibit unexpected behavior, such as sudden 
changes in transaction volume or connections to known illicit entities. 

Transaction Graph Analysis: Transaction graph analysis leverages the 
inherent structure of blockchain data, representing transactions as 
nodes and addresses as edges in a graph. By analyzing the topology of 
the transaction graph, anomalies such as unusual transaction flows, 
hub addresses with high connectivity, or isolated clusters of addresses 
may be detected. Graph-based anomaly detection techniques, such as 
graph centrality measures or community detection algorithms, can 
uncover abnormal network structures or suspicious activity patterns 
within the blockchain network. 

Temporal Analysis: Temporal analysis focuses on identifying anoma- 
lies based on changes in transaction patterns over time. By analyzing 
transaction timestamps, transaction frequency, or transaction volume 
trends, temporal anomaly detection techniques can detect sudden 
spikes or drops in activity, unusual patterns of transaction timing, or 
deviations from historical transaction behavior. These anomalies may 
indicate fraudulent activities, such as coordinated attacks or insider 
threats attempting to manipulate the blockchain network. 

Consensus Mechanism Monitoring: Consensus mechanism monitoring 
involves analyzing the behavior of network participants and valida- 
tors to detect anomalies in the consensus process. For example, in PoW 
blockchains, anomalies such as sudden changes in mining difficulty, 
hash rate fluctuations, or long forks in the blockchain may indicate 
attempted 51% attacks or mining pool manipulation. Similarly, in PoS 
blockchains, anomalies such as unusual voting patterns or stake concen- 
tration may signal attempts to compromise the integrity of the network. 

Machine Learning-Based Approaches: Anomalies in blockchain networks 
can be detected by employing ML techniques, which encompass super- 
vised and unsupervised learning algorithms. By utilizing labeled data, 
supervised learning models can be trained to classify transactions or 
network behavior as normal or anomalous based on predetermined cri- 
teria. However, unsupervised learning algorithms, such as clustering or 
autoencoders, can identify unusual patterns or outliers in the data with- 
out the need for labeled examples. By harnessing the power of ML, the 
process of anomaly detection in blockchain networks can be automated 
and adjusted to adapt to evolving threats and attack vectors. 


10.7.3 Case studies and real-world applications 


The promise of blockchain technology to transform existing processes, 
improve transparency, and stimulate innovation has garnered considerable 
interest from a range of businesses. The benefits and practical application of 
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blockchain technology in a variety of industries are demonstrated by a large 
number of case studies and real-world applications. 


Supply Chain Management: One of the most prominent uses of block- 
chain technology is in the management of supply chains. Prominent 
corporations such as Walmart and IBM have effectively integrated 
blockchain solutions to monitor the provenance and flow of goods 
from their point of origin to the final customer. Blockchain provides 
supply chain transparency, traceability, and authenticity by record- 
ing transactions on a distributed ledger. As a result, there are fewer 
instances of inefficiencies, fraud, and counterfeiting. 

Financial Services: Because blockchain technology makes transactions 
faster, more secure, and more affordable, it has significantly altered 
the financial services sector. For example, Ripple uses blockchain to 
help financial institutions send money internationally, which cuts 
down on transaction fees and settlement times. In a similar vein, DeFi 
platforms like MakerDAO and Compound use blockchain technology 
to offer decentralized borrowing, trading, and lending services. This 
promotes financial inclusion for underprivileged groups by eschewing 
traditional middlemen. 

Healthcare: Blockchain technology has the potential to totally alter the 
healthcare sector by ensuring interoperability, enhancing transpar- 
ency in medical supply chains, and securely storing and transmitting 
patient data. Companies like Medicalchain and SimplyVital Health 
are actively developing blockchain-based solutions for pharmaceuti- 
cal traceability, medical credentialing, and electronic health record 
(EHR) administration. These advancements provide individuals 
greater control over their health information while guaranteeing secu- 
rity and privacy. 

Identity Management: Identity management systems can be greatly 
enhanced by the implementation of blockchain technology, as it pro- 
vides a secure and tamper-proof record of individuals’ identities and 
credentials. For example, Estonian people may now securely access 
government services, sign papers, and authenticate themselves online 
thanks to the government’s effective integration of blockchain-based 
digital identification systems. In a similar vein, businesses like Civic 
and uPort are actively developing decentralized identity platforms 
that let consumers profit from and securely manage their personal 
information. 

Chain of Supply Traceability: A further noteworthy use of block- 
chain technology is to guarantee the authenticity and traceability 
of goods in a variety of sectors. For instance, VeChain successfully 
uses blockchain to track and confirm the legitimacy of prescrip- 
tion drugs, food items, and luxury goods. This increases consumer 
confidence, lowers the availability of fake goods, and promotes 
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sustainability by helping businesses monitor and lessen their envi- 
ronmental effect. It also allows consumers to confirm the prove- 
nance and quality of products. 


10.8 BLOCKCHAIN ANALYTICS AND VISUALIZATION 


10.8.1 Exploratory data analysis of 
blockchain networks 


The examination and comprehension of the structure, behavior, and char- 
acteristics of blockchain data to gain insights and inform decision-making 
is an integral part of the Exploratory Data Analysis (EDA) of blockchain 
networks. 

Investigative graph analysis of the Ethereum blockchain’s network data 
was the main focus of Yap et al.’s study. To analyze transaction data, this 
research makes use of mathematical and statistical modeling in addition to 
network visualization. For EDA of blockchain networks, the special char- 
acteristics of blockchain technology—such as its immutability, transpar- 
ency, and decentralization—present both potential and obstacles. We shall 
examine the essential elements and methods of EDA of blockchain net- 
works in this paper [12]. 

The first step in the EDA of blockchain networks is the collection and 
preparation of data for analysis. Blockchain data is typically obtained from 
transaction records, block information, and network statistics obtained 
from blockchain explorers, APIs, or direct access to the blockchain net- 
work. Data preprocessing tasks may involve cleaning, filtering, and trans- 
forming the raw data to eliminate noise, handle missing values, and ensure 
consistency and integrity. 


Descriptive Statistics: Descriptive statistics provide an overview of the 
basic characteristics and distribution of blockchain data. Key descrip- 
tive statistics include transaction counts, block sizes, transaction fees, 
transaction volumes, and time series analysis of network activity over 
time. Descriptive statistics help identify trends, patterns, and anoma- 
lies in blockchain data and provide insights into network dynamics 
and behavior. 

Transaction Analysis: Transaction analysis focuses on understanding 
the flow and behavior of transactions within the blockchain network. 
This involves examining transaction attributes such as sender and 
receiver addresses, transaction amounts, transaction types, and time- 
stamps. Transaction analysis techniques include visualizations such 
as histograms, pie charts, and scatter plots to explore transaction pat- 
terns, identify outliers, and detect suspicious activities. 

Network Topology Analysis: Network topology analysis involves study- 
ing the structure and connectivity of the blockchain network. This 
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includes analyzing the distribution of nodes, the degree distribution of 
the network, and the connectivity between nodes. Network analysis 
techniques such as graph theory, centrality measures, and community 
detection algorithms help identify key nodes, clusters, and subnet- 
works within the blockchain network. 

Temporal Analysis: Temporal analysis focuses on understanding how 
blockchain data evolves over time. This includes analyzing transaction 
timestamps, block intervals, and network activity trends. Temporal 
analysis techniques such as time series analysis, trend analysis, and 
seasonality detection help identify patterns, cyclicality, and anomalies 
in blockchain data over different time periods. 


Blockchain addresses that are most likely under the control of the same 
entity or person are grouped together using address clustering techniques. 
This makes it easier to identify groups of addresses linked to particular 
businesses, such as wallets, exchanges, or people. Within the blockchain 
network, address clustering makes it easier to identify entities, attribute 
transactions, and analyze user behavior [10,13,14]. 

The goal of anomaly detection techniques is to locate odd or questionable 
activity happening throughout the blockchain network. This entails look- 
ing for anomalies, departures from the norm, and possible security risks 
like 51% or double-spending attacks. Statistical analysis, ML algorithms, 
and heuristics-based techniques customized for blockchain data properties 
are some examples of anomaly detection techniques. 


10.8.2 Visualizing blockchain data for insights 


Visualizing blockchain data is a powerful technique for gaining insights, 
understanding patterns, and communicating complex information in a more 
intuitive and accessible manner. Vinceslas et al. [8] proposed an abstraction 
layer architecture to improve the auditability and intuitiveness of complex 
business analysis of distributed ledger systems. By utilizing various visual- 
ization techniques, stakeholders can effectively explore and analyze block- 
chain data, uncovering valuable insights and informing decision-making 
processes. In this paper, we will explore the importance of visualizing block- 
chain data and discuss key visualization techniques used in this context. 


Understanding Complex Relationships: Blockchain data often involves 
complex relationships between transactions, addresses, blocks, and 
network participants. Visualization techniques such as network 
graphs, Sankey diagrams, and chord diagrams help illustrate these 
relationships visually, enabling stakeholders to understand the flow of 
transactions, identify clusters of addresses, and explore the connectiv- 
ity and interdependencies within the blockchain network. 
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Identifying Transaction Patterns: Visualizing transaction patterns is 
essential for understanding the behavior of users, identifying trends, 
and detecting anomalies within the blockchain network. Time series 
plots, heatmaps, and histograms can be used to visualize transaction 
volumes, transaction frequencies, and transaction values over time. By 
visualizing transaction patterns, stakeholders can identify peak peri- 
ods of activity, detect irregularities, and assess the overall health of 
the blockchain network. 

Exploring Network Topology: Visualizing the network topology of the 
blockchain network helps stakeholders understand the structure, con- 
nectivity, and centrality of nodes within the network. Techniques such 
as node-link diagrams, force-directed layouts, and tree maps can be 
used to visualize the distribution of nodes, the degree distribution of 
the network, and the relationships between nodes. Network topol- 
ogy visualizations enable stakeholders to identify key nodes, detect 
clusters, and assess the resilience and robustness of the blockchain 
network [14]. 

Tracking Transaction Flows: Visualizing transaction flows within the 
blockchain network provides insights into the movement of assets, the 
path of transactions, and the distribution of funds across addresses. 
Flow diagrams, Sankey diagrams, and flowcharts can be used to 
visualize transaction flows, highlighting the origins, destinations, 
and intermediaries involved in transactions. By tracking transaction 
flows visually, stakeholders can detect suspicious activities, trace the 
movement of funds, and assess the efficiency of transaction processing 
within the blockchain network. 

Monitoring Market Dynamics: Visualizing market dynamics within 
blockchain networks, such as cryptocurrency markets, helps stake- 
holders analyze price movements, trading volumes, and market 
trends. Candlestick charts, line charts, and scatter plots can be 
used to visualize price fluctuations, trading volumes, and market 
indicators over time. By monitoring market dynamics visually, 
stakeholders can identify trading patterns, assess market senti- 
ment, and make informed decisions in cryptocurrency trading and 
investment. 

Detecting Anomalies and Security Threats: Visualizing anomalies and 
security threats within the blockchain network helps stakeholders 
identify irregularities, detect potential attacks, and mitigate risks 
effectively. Anomaly detection visualizations, such as heatmaps, scat- 
ter plots, and box plots, can be used to visualize outliers, deviations 
from expected behavior, and potential security breaches within the 
blockchain network. By visualizing anomalies and security threats, 
stakeholders can take proactive measures to safeguard the integrity 
and security of the blockchain ecosystem. 
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10.9 PRIVACY PRESERVATION AND 
DE-ANONYMIZATION TECHNIQUES 


10.9.1 Preserving user privacy in 
blockchain transactions 


Because blockchain technology is inherently transparent and immutable, 
protecting user privacy in transactions is very important. Although block- 
chain technology has many benefits, like data integrity and decentraliza- 
tion, it also has drawbacks with regard to user confidentiality and privacy. 
To protect user privacy in blockchain transactions, a number of strategies 
and tactics are used in this context. 

Using cryptographic techniques like encryption, ZKPs, and ring signa- 
tures is one of the main ways to keep users’ privacy safe in blockchain trans- 
actions. Techniques for encrypting data make sure that private information, 
including transaction amounts or user identities, is hidden from prying eyes 
before being added to the blockchain. With ZKPs, one party (the prover) can 
show the other party (the verifier) that a statement is true without reveal- 
ing any further information. As a result, consumers can transact without 
disclosing any personal information. Users can sign transactions on behalf 
of a group using ring signatures, which makes it difficult to identify which 
group member began the transaction, thereby ensuring anonymity. 

Preserving user privacy in blockchain transactions can also be achieved 
by using blockchain platforms and protocols that give privacy priority. 
These platforms—Monero, Zcash, and Dash, for example—are made with 
the express purpose of protecting user privacy through the deployment of 
cutting-edge cryptographic methods like stealth addresses, ring signatures, 
and zk-SNARKs (Zero-Knowledge Succinct Non-Interactive Arguments 
of Knowledge). By ensuring that transaction details, including sender and 
recipient addresses and transaction amounts, are hidden or masked, these 
measures protect user privacy while upholding the blockchain’s integrity 
and security. 

To improve user privacy in blockchain transactions, privacy-enhancing 
technology like mixers and tumblers is utilized in addition to cryptographic 
methods and privacy-focused blockchain platforms. In order to make it dif- 
ficult to track down individual transactions, mixers and tumblers combine 
several transactions into one and blend them together. By doing this, people 
taking part in blockchain transactions can benefit from increased privacy 
and anonymity as the link between sender and recipient addresses is effec- 
tively severed [15]. 

Furthermore, the maintenance of user privacy in blockchain transactions 
is greatly dependent on regulatory compliance and data protection proce- 
dures. The handling and processing of personal data, including data held 
on blockchain networks, is subject to strict regulations, such as the General 
Data Protection Regulation (GDPR) in the European Union. To guarantee 
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that user privacy rights are respected and protected, blockchain developers 
and operators must comply with these requirements by putting in place 
privacy-enhancing measures such as data minimization, pseudonymization, 
and user consent methods. 


10.9.2 De-anonymization attacks 
and countermeasures 


De-anonymization attacks pose a significant threat to user privacy in block- 
chain transactions, as they aim to reveal the identities of users participating 
in transactions on the blockchain. These attacks exploit vulnerabilities in 
blockchain networks and privacy-enhancing technologies to uncover sensi- 
tive information about users, such as their real-world identities, transaction 
history, and financial activities. In response to these threats, various coun- 
termeasures and mitigation strategies have been developed to enhance user 
privacy and protect against de-anonymization attacks. 

One common de-anonymization attack is known as network analysis, 
where adversaries analyze the structure and connectivity of the blockchain 
network to infer relationships between users and trace transactions back 
to their origin. By analyzing transaction patterns, network topology, and 
transaction flows, attackers can identify clusters of addresses associated 
with specific users or entities, enabling them to de-anonymize users and 
uncover sensitive information. 

Another de-anonymization technique is known as de-anonymization 
through auxiliary information. This attack leverages external sources of 
information, such as social media profiles, online forums, or leaked data- 
bases, to link blockchain addresses to real-world identities. By correlating 
blockchain addresses with publicly available information, attackers can 
identify individuals participating in blockchain transactions and potentially 
expose their private activities. 

Sybil attacks are another issue that raises questions about user privacy 
in blockchain networks. Attackers use several false identities or pseudony- 
mous accounts in these types of attacks in an attempt to take over a sizable 
chunk of the network. Attackers can alter the network’s consensus, con- 
trol the flow of transactions, and conduct targeted assaults against certain 
users or transactions by controlling a sizable number of network nodes or 
addresses. 

To address de-anonymization attacks and safeguard the privacy of users 
in blockchain transactions, various countermeasures and strategies have 
been devised: 


1. Use of Privacy-Enhancing Technologies: To improve user anonymity, 
privacy-focused blockchain platforms and protocols, like Monero, 
Zcash, and Dash, integrate cutting-edge cryptography approaches 
and privacy features. These platforms conceal transaction data and 
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protect user privacy by using techniques like ring signatures, stealth 
addresses, and ZKPs. 

2. Utilizing CoinJoin and Mixing Services: By combining their transac- 
tions with those of other users, users can use CoinJoin and mixing 
services to make it more difficult for adversaries to track down the 
source of a transaction. These services combine transactions from 
several users, resulting in a jumbled transaction history that hides the 
relationship between the source and recipient addresses [16]. 

. Prevention of Address Reuse: Avoiding the reuse of addresses can help 
mitigate the risk of de-anonymization attacks. Reusing addresses can 
facilitate the linking of multiple transactions to the same user, mak- 
ing it easier for attackers to identify and exploit user information. 
By using new addresses for each transaction or implementing hierar- 
chical deterministic (HD) wallets, the likelihood of address reuse is 
reduced, thereby enhancing user privacy. 

4. Enhancements in Network and Transactional Privacy: Implementing 
privacy enhancements at the network level, such as onion routing or 
mixnets, can obscure transaction metadata and prevent adversaries 
from monitoring the flow of transactions. Furthermore, adopting trans- 
actional privacy enhancements like confidential transactions or bullet- 
proofs can conceal transaction amounts and further elevate user privacy. 


Ow 


Apart from the technological solutions outlined above, regulatory compli- 
ance and following best practices are essential for protecting user privacy 
and reducing the likelihood of de-anonymization assaults. Privacy protec- 
tion can be enhanced by adhering to legal standards, such as those per- 
taining to data protection and anti-money laundering (AML) legislation. 
Adopting best practices for user consent, pseudonymization, and data mini- 
mization can help improve privacy protection and lower the risk of user 
data exposure. 


10.9.3 Privacy-preserving machine learning 
techniques for blockchain 


Privacy-preserving ML techniques have a significant role in safeguarding 
sensitive data while extracting valuable insights from blockchain networks. 
The increasing adoption of blockchain technology in various industries 
emphasizes the importance of maintaining data privacy and confidentiality. 
In this regard, privacy-preserving ML techniques provide innovative solu- 
tions to address privacy concerns while leveraging the advantages offered by 
blockchain technology. This article explores several key privacy-preserving 
ML techniques applicable to blockchain applications. 

A cryptographic technique called homomorphic encryption makes it pos- 
sible to do calculations on encrypted material without requiring its decryp- 
tion. This ensures the secrecy of sensitive data and allows ML algorithms 
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that protect privacy to analyze encrypted data stored on the blockchain. 
Homotopy encryption efficiently safeguards user confidentiality and pri- 
vacy by encrypting data before it is processed or stored on the blockchain. 
Predictive analytics and data sharing are two areas where it finds use in 
industries like healthcare, finance, and other delicate disciplines. 

Secure Multi-Party Computation (MPC) is a cryptographic protocol 
that enables multiple parties to jointly compute a function over their inputs 
while maintaining the privacy of those inputs. In the context of blockchain, 
MPC allows participants to collaborate on ML tasks, such as model train- 
ing or prediction, without revealing their private data to each other or third 
parties. This ensures privacy and confidentiality while facilitating collab- 
orative ML in decentralized environments. 

Differential privacy is a privacy-preserving technique that modifies the 
results of sensitive data queries or computations in order to introduce noise 
and hence offer strong privacy assurances. Differential privacy allows for 
accurate aggregate analysis while preventing adversaries from deriving sen- 
sitive information about individual data points by adding regulated levels 
of noise. ML models based on blockchain data are protected against unau- 
thorized disclosure of sensitive information by using differential privacy 
strategies. 

Federated Learning is a decentralized ML technique in which only model 
updates are exchanged with a central server or aggregator, while model 
training is done locally on edge devices or nodes. This eliminates the need 
to share raw data and allows collaborative model training across remote 
data sources, like blockchain nodes. Federated learning preserves user pri- 
vacy by storing sensitive data locally on devices and facilitating central 
sharing and aggregation of model advances. It is especially appropriate for 
blockchain network applications when privacy is a concern. 

ZKPs are cryptographic procedures that let the prover (one party) show 
the verifier (another party) that a statement is true without revealing any 
further information. ZKPs can be utilized inside the blockchain framework 
to confirm the precision of calculations or transactions without concealing 
confidential inputs or information. By doing this, the blockchain’s transac- 
tion authenticity and integrity are guaranteed, and user privacy is protected. 


10.10 NETWORK SECURITY AND 
ATTACK DETECTION 


10.10.1 Understanding the security 
of blockchain networks 


Gaining an understanding of blockchain security is essential to understand- 
ing the strength and dependability of blockchain technology, which forms 
the basis for a wide range of industries and applications. Blockchain, which 
was first created as the foundational technology for Bitcoin, has matured into 


176 Big Data and Blockchain Technology for Secure loT Applications 


a flexible framework with a wide range of uses, including supply chain man- 
agement and banking. Its distributed, decentralized architecture along with 
cryptography concepts give it intrinsic security. But in order to truly under- 
stand blockchain network security, one needs to examine all of its essential 
elements, possible weak points, and security risk-mitigation techniques. 

The potential of blockchain technology in network security has attracted 
attention. It provides auditability, security, anonymity, immutability, and 
decentralization. Blockchain overcomes the restrictions and difficulties 
related to its application in network security by offering a distributed, trans- 
parent, and impenetrable ledger for safe data movement throughout the 
network. A sociotechnical security analysis of blockchain systems reveals 
discrepancies between the social, technical, and infrastructural layers that 
impact the security and trust assumptions. Moreover, blockchain has been 
the subject of investigation in the field of cybersecurity, where it has proven 
beneficial in addressing security issues and offering advantages such as data 
integrity and confidentiality. 

Decentralization, consensus techniques, cryptography, an immutable 
ledger, and network design are all parts of blockchain network security. 
Blockchain networks are guaranteed to function via a dispersed network 
of nodes rather than depending on a central authority thanks to decentral- 
ization. By removing single points of failure and lowering the possibility of 
hostile attacks or manipulation, this improves security. Consensus systems, 
such as DPoS, PoW, PoS, and Practical Byzantine Fault Tolerance (PBFT), 
guarantee that network users agree on the legitimacy of transactions and the 
sequence in which they should be recorded on the blockchain. By preventing 
double-spending, these techniques protect the blockchain network’s integrity. 

Blockchain network security is greatly aided by cryptography, which 
offers digital signature, encryption, and authentication techniques. 
Public-key cryptography is a widely used technique for generating crypto- 
graphic key pairs, signing transactions, confirming asset ownership, and 
enabling safe communication between network users. A transaction cannot 
be changed or removed once it has been recorded because to the blockchain 
ledger’s immutability. It is computationally impossible to change previous 
transactions without the consent of the majority of network users, a feature 
made possible by cryptographic hash functions and the consensus process. 
Peer-to-peer (P2P) networking is used in blockchain networks, where nodes 
interact with one another directly and without the need for middlemen. By 
facilitating redundancy, consensus, and data distribution, this architecture 
improves the blockchain network’s resilience and security. While block- 
chain technology presents robust security features, it is not impervious to 
vulnerabilities and threats. Blockchain networks face a number of common 
risks and vulnerabilities, such as the 51% attacks. A 51% attack occurs in 
PoW consensus systems when a single person or organization has a dispro- 
portionate amount of the network’s hashing power. This gives the attacker 
the ability to alter transactions, undo transactions, or spend coins twice. 
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One such weakness is the Sybil attacks. In Sybil attacks, several fictitious 
identities or nodes are created in an attempt to take control of the network 
or sway the consensus process. Because these attacks interfere with con- 
sensus processes and spread misleading information, they jeopardize the 
security and integrity of the blockchain network. 

Vulnerabilities in smart contracts present an additional danger. 
Self-executing contracts with predefined circumstances, or “smart con- 
tracts,” can be exploited and have security holes or coding faults. Smart 
contract vulnerabilities put users and blockchain applications at risk of 
theft, unauthorized activity, or money loss. 

Furthermore, privacy risks emerge despite the pseudonymous nature 
of blockchain transactions. The traceability of transactions on the public 
ledger allows for the analysis of transaction patterns, network traffic, and 
metadata, which can unveil sensitive information about users and their 
transactions, posing a threat to privacy. 

Double-spending is also a concern. It occurs when a user spends the 
same digital asset multiple times, leading to inconsistencies in the block- 
chain ledger. While consensus mechanisms such as PoW and PoS prevent 
double-spending, the risk persists in permissionless blockchain networks. 

To mitigate security risks and enhance blockchain network security, 
several strategies and best practices are employed. Diversifying consensus 
mechanisms through hybrid consensus models reduces the risk of 51% 
attacks and enhances network security. Transparent governance mecha- 
nisms ensure that decisions regarding network upgrades, protocol changes, 
and security enhancements are made collaboratively and transparently. 
Regular code audits and security assessments of blockchain protocols, 
smart contracts, and network infrastructure help identify and mitigate vul- 
nerabilities early in the development lifecycle. 

On the blockchain, user privacy and confidentiality are improved by imple- 
menting privacy-enhancing technologies like homomorphic encryption, ring 
signatures, and ZKPs. Finally, the maintenance of a safe and robust block- 
chain ecosystem depends on educating developers, users, and stakeholders 
on blockchain security best practices, dangers, and mitigation techniques. 
Workshops, security awareness campaigns, and training sessions educate 
participants about security risks and provide them with the tools they need 
to take preventative action to safeguard the network and themselves. 


10.10.2 Detecting and mitigating attacks 
with machine learning 


ML has become a powerful tool for strengthening cybersecurity defenses 
across a range of disciplines when used in attack detection and mitigation. 
It is frequently insufficient to defend against emerging threats to rely exclu- 
sively on traditional security measures, given the growth of digital threats 
and the increasing sophistication of assaults. Using ML techniques in this 
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situation can help cybersecurity measures become more resilient and effec- 
tive by quickly identifying, analyzing, and reacting to cyber threats in real 
time. This talk explores how ML can be used to detect and lessen assaults, 
as well as the difficulties that come with using it and best practices for put- 
ting ML-based security solutions into practice. 


10.10.2.1 The application of machine 
learning in cybersecurity 


Without the need for explicit programming, ML techniques help computer 
systems learn from data, identify patterns, and make predictions or judg- 
ments. In cybersecurity, ML techniques are used to analyze large amounts 
of security data, including system logs, network traffic, and user behavior. 
The goal of this analysis is to find abnormalities, identify malicious activity, 
and mitigate cyber threats. Security systems based on ML have many ben- 
efits, such as automation, scalability, and adaptability to changing threats. 


10.10.2.2 Detection of cyberattacks 
through machine learning 


Anomaly Detection: This popular ML technique is used to find odd or sus- 
picious activity in cybersecurity data. ML algorithms acquire knowledge of 
the standard patterns and actions displayed by networks, systems, and users. 
They then identify any departures from these standard practices as possible 
anomalies. Methods including density estimation, clustering, and unsuper- 
vised learning are used to find patterns in system logs, network traffic, and 
user activity. Signature-based Detection: This method of identifying known 
threats or malicious activity uses signatures, which are preset rules or pat- 
terns. By using previous data to identify these characteristics, ML algorithms 
may be trained to categorize new instances based on how similar they are 
to existing dangers. It has been shown that signature-based detection works 
effectively for identifying viruses, malware, and other common attack vectors. 

Behavioral analysis is the process of keeping an eye on and assessing how 
individuals, devices, and apps behave in order to spot any abnormalities 
that might point to malicious activity. Supervised learning and reinforce- 
ment learning are two ML techniques that can be used to learn behavior 
profiles and quickly spot any abnormal behavior. When it comes to iden- 
tifying insider threats, advanced persistent threats (APTs), and zero-day 
assaults, behavioral analysis is especially useful. 

Threat Intelligence Integration: To improve their ability to detect threats, 
ML algorithms can take advantage of threat intelligence feeds, vulnerability 
databases, and security advisories. The identification of zero-day vulnera- 
bilities, new threats, and indicators of compromise (IOCs) can be enhanced 
by ML-based security solutions through the integration of external threat 
intelligence sources with internal security data. 
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10.10.2.3 Mitigating cyberattacks with machine learning 


i; 


ii. 


iii. 


iv. 


ML-based security systems automate incident response, allowing for 
real-time identification and mitigation of cyber threats. Automated 
response mechanisms, such as adaptive access restrictions, network 
segmentation, and dynamic policy enforcement, may be implemented 
using ML for threat analysis and risk assessment. 

Predictive Maintenance: ML algorithms can forecast security events 
by analyzing past data, finding patterns of vulnerabilities, and pro- 
actively applying preventative measures. Predictive maintenance 
approaches help organizations anticipate and address security prob- 
lems before they become full-fledged assaults. 

Adaptive Defense Strategies: ML-based security solutions enable 
adaptive defense strategies that dynamically adjust security mea- 
sures based on evolving threat landscapes and changing risk profiles. 
Adaptive defense mechanisms, such as dynamic threat modeling, 
adaptive authentication, and self-learning security controls, continu- 
ously adapt to emerging threats and mitigate risks in real time. 
Threat Hunting and Attribution: ML algorithms can assist security 
analysts in threat hunting and attribution by correlating disparate 
security data, identifying attack patterns, and attributing malicious 
activities to specific threat actors or groups. ML-driven threat hunting 
techniques, such as behavior-based clustering, similarity analysis, and 
link analysis, enhance the effectiveness of cybersecurity investigations 
and incident response efforts. 


10.10.2.4 Challenges and considerations 


While ML-based security solutions provide major benefits, there are several 
issues and concerns that must be addressed: 


i 


i. 


iii. 


Data Quality and Labeling: To learn successfully, ML systems require 
high-quality training data that has been labeled. Ensuring the quality, 
completeness, and relevance of training data is critical for developing 
strong ML models for cybersecurity. 

Adversarial Attacks: Adversarial attacks aim to deceive ML models 
by manipulating input data to produce incorrect outputs. Adversarial 
robustness techniques, such as adversarial training, model ensembling, 
and input perturbation, are essential for defending against adversarial 
attacks in cybersecurity. 

Model Interpretability: Comprehending and explaining the decisions 
made by ML models are crucial for fostering trust, comprehending 
behavior, and preventing prejudice or discrimination. Model inter- 
pretability approaches, including feature significance analysis, model 
visualization, and explanation generation, improve the transparency 
and accountability of ML-based security solutions. 
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iv. Scalability and Performance: ML-based security systems require scal- 
ability and performance to manage massive amounts of data, real-time 
processing, and many attack scenarios. In cybersecurity, solutions for 
boosting scalability and performance include optimizing ML algo- 
rithms, establishing distributed computer infrastructure, and using 
cloud services. 


10.10.2.5 Best practices for implementing 
ML-based security solutions 


i. Continuously monitor and assess ML models for performance, accu- 
racy, and efficacy. Regular updates, retraining, and validation are crit- 
ical for sustaining the relevance and reliability of ML-based security 
solutions over time. 

ii. Collaborative Threat Intelligence Sharing: Collaborative threat intel- 
ligence sharing among organizations, industries, and cybersecu- 
rity communities enhances collective defense against cyber threats. 
Sharing threat data, IOCs, and actionable intelligence enables timely 
detection and mitigation of cyberattacks. 

iii. Human-Machine Collaboration: Human-machine collaboration is 
essential for effective cybersecurity operations. ML-based security 
solutions should augment, rather than replace, human expertise and 
decision-making. Security analysts play a critical role in validating 
ML -generated alerts, investigating security incidents, and making 
informed decisions based on ML-driven insights. 

iv. Privacy and Ethical Considerations: Protecting user privacy, respect- 
ing ethical principles, and adhering to regulatory requirements 


10.10.3 Enhancing resilience against network attacks 


Enhancing resilience against network attacks is crucial in today’s digital 
landscape, where cyber threats are continuously evolving and becoming 
more sophisticated. Network attacks pose significant risks to organiza- 
tions, governments, and individuals, leading to data breaches, financial 
losses, and reputational damage. To combat these threats effectively, it 
is essential to implement robust cybersecurity measures that enhance the 
resilience of network infrastructure and mitigate the impact of attacks. This 
article explores strategies for enhancing resilience against network attacks, 
including proactive defense measures, incident response strategies, and best 
practices for building resilient network architectures. 


10.10.3.1 Understanding network attacks 


A vast array of malevolent actions with the intent to jeopardize the availabil- 
ity, confidentiality, or integrity of network resources are collectively referred 
to as network assaults. Typical kinds of cyberattacks consist of the following: 


Machine learning techniques for blockchain technology: a review 181 


Attacks known as denial of service (DoS) and distributed denial of service 
(DDoS) overload servers and routers with excessive traffic, blocking them 
from being used by authorized users. Malware and ransomware attacks: To 
steal data, interfere with operations, or demand ransom payments, mali- 
cious software—such as viruses, worms, and ransomware—infiltrates net- 
work systems. Phishing and social engineering attacks: Users are tricked 
into disclosing private information, including login passwords or financial 
information, by means of phishing emails, phony websites, and social engi- 
neering techniques. 


10.10.3.1.1 Strategies for enhancing resilience against network attacks 


L 


ies) 


The application of Protection-in-A security strategy called “depth” 
uses a variety of security controls at multiple network architectural 
layers. These controls include internal ones like network segmentation 
and endpoint security solutions, as well as perimeter defenses like fire- 
walls and intrusion detection system (IDS). Organizations can iden- 
tify and stop network assaults at different points in the cyber death 
chain by implementing numerous levels of defense. 


. In order to recognize and react to network threats instantly, ongoing 


monitoring and threat detection are essential. System logs, network 
traffic, and user behavior must all be regularly observed. The detec- 
tion of suspicious activity and possible security breaches can be aided 
by anomaly detection algorithms, security information and event 
management (SIEM) systems, and intrusion detection and prevention 
system (IDPS). Through the analysis of massive amounts of security 
data and the identification of patterns suggestive of harmful behavior, 
the application of ML and artificial intelligence techniques can fur- 
ther improve threat detection capabilities. 


. Hardening network infrastructure involves the implementation of 


security best practices and configuration settings to reduce the attack 
surface and minimize vulnerabilities. This includes regular patch- 
ing and updating of software, disabling unnecessary services and 
protocols, and implementing strong authentication mechanisms like 
multi-factor authentication (MFA) and encryption. 


. Using configuration settings and security best practices to lower 


attack surfaces and minimize vulnerabilities is known as hardening 
network infrastructure. This entails applying software patches and 
updates on a regular basis, turning down unused services and proto- 
cols, and putting robust authentication measures like encryption and 
MFA in place. 


. Creating and testing an incident response plan on a regular basis is 


essential to responding to network threats efficiently and reducing 
their effects. In the case of a security incident, the incident response 
plan should specify roles and responsibilities, escalation proce- 
dures, communication protocols, and recovery actions. By running 
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simulations and tabletop exercises, staff members are guaranteed to 
be ready to react swiftly and efficiently to cyberattacks [7]. 


. Creating a security-aware culture inside an organization requires 


educating staff members on social engineering techniques, threat 
awareness, and cybersecurity best practices. Programs for security 
awareness training ought to address subjects including safe browsing 
practices, password hygiene, and phishing awareness. Establishing 
employee empowerment to identify and communicate security con- 
cerns can help organizations improve their resistance to network 
intrusions. 


. Collaboration and information sharing among organizations, indus- 


try sectors, and cybersecurity communities are crucial for collectively 
defending against network attacks. The sharing of threat intelligence, 
attack indicators, and best practices enables organizations to identify 
emerging threats, respond proactively, and enhance their overall resil- 
ience against cyber threats. 


10.10.3.2 Building resilient network architectures 


i 


ji; 


iii. 


iv. 


Building resilient network architectures necessitates an all- 
encompassing approach that amalgamates technology, processes, and 
individuals. Fundamental principles for constructing resilient network 
architectures incorporate: 

Redundancy and Failover Mechanisms: The implementation of dupli- 
cated network components, such as backup servers, redundant power 
supplies, and failover mechanisms, guarantees heightened availability 
and mitigates downtime in the event of a network failure or attack. 
Scalability and Elasticity: The design of network architectures that 
are scalable and elastic equips organizations to accommodate esca- 
lating traffic demands and adapt to evolving business requirements. 
Cloud-based solutions and virtualized network functions foster scal- 
ability and agility in response to developing threats. 

Automation and Orchestration: Reducing the possibility of human 
mistake and increasing operational efficiency are two benefits of auto- 
mating standard network administration tasks including provision- 
ing, configuration, and monitoring. Organizations can mitigate the 
effects of network assaults and respond quickly to security problems 
by utilizing orchestration technologies and automation frameworks. 


. Communication Protocols That Are Resilient: Resilient communica- 


tion protocols, such as IPsec, Secure Shell (SSH), and Transport Layer 
Security (TLS), are used to prevent unauthorized access to network 
conversations and to protect data while it is in transit. Data commu- 
nicated across a network is guaranteed to be authentic, confidential, 
and integrity thanks to secure communication methods. 
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vi. 


Continuous Evaluation and Improvement: Continuously assessing the 
effectiveness of network security measures and adapting to emerg- 
ing threats is indispensable for upholding resilience against network 
attacks. Regular security assessments, penetration testing, and risk 
assessments aid in identifying vulnerabilities in the network architec- 
ture and accordingly prioritizing security investments. 


10.11 CHALLENGES, FUTURE DIRECTIONS, 


ii. 


iii. 


iv. 


AND EMERGING TRENDS 


. A number of difficulties and roadblocks need to be addressed in order 


to fully realize the promise of blockchain technology in DL systems 
when examining the convergence of ML and DL with blockchain 
technology for intelligence applications. The following are included in 
the final thoughts and important suggestions: 

Monitoring the volume and type of data used to train DL models 
through the use of blockchain’s data traceability, immutability, and 
integrity features has a lot of potential. But data quality problems are 
hard for existing blockchain systems to handle efficiently, especially in 
delicate industries like transportation and healthcare. 

The efficacy of current blockchain-based DL systems is significantly 
impacted by key performance indicators, such as system throughput, 
execution delay, block propagation time, data volume, competing 
interests among participants, and smart contract vulnerabilities. 
While public blockchain platforms are vulnerable to data privacy 
breaches because of their zero-access control policy, private block- 
chain platforms provide data privacy through private channels and 
access control policies. However, public blockchain systems are supe- 
rior at recording the development of DL models at every step of their 
construction, modification, or application. 


. The scale of the blockchain network has a significant impact on the scal- 


ability of blockchain-based applications. These applications can func- 
tion more efficiently when DL techniques are used for data compression 
and redundant data minimization, especially as networks get larger. 


10.11.1 Future directions and emerging trends 


L 


The future direction and emerging trends in the convergence of block- 
chain intelligence and ML and DL have the potential to have a signifi- 
cant impact on intelligence applications across a variety of industries, 
building upon the challenges and suggestions previously discussed. 
We outline some possible directions for further study and invention in 
this section. 
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2. Improved Data Quality Control: Future research endeavors ought to 
focus principally on creating resilient methods that guarantee data 
quality in blockchain-based DL systems. To address issues with data 
correctness, completeness, and consistency, this include investigating 
methods for data validation, verification, and cleansing. Furthermore, 
developments in lineage tracing and data provenance can offer insight- 
ful information about the beginnings and development of data within 
blockchain networks. 

3. Optimized Performance Metrics: More investigation is needed to improve 
system throughput, execution latency, and block propagation time in 
particular when it comes to performance metrics in blockchain-based 
DL systems. New consensus techniques like sharding and PoS could 
increase efficiency and scalability. Computational performance may 
also be improved by developments in hardware acceleration technolo- 
gies, such as quantum computing or specialized DL processors. 

4. Privacy-Preserving Techniques: Future advancements in privacy- 
preserving methods for blockchain-based DL systems are eagerly 
awaited, considering the importance of data privacy. This involves 
investigating cutting-edge cryptographic techniques like homomorphic 
encryption and secure MPC to allow for private and secure data shar- 
ing and calculation without sacrificing confidentiality. Additionally, 
while preserving the integrity of blockchain transactions, user privacy 
can be improved by integrating differential privacy methods and ZKPs. 

5. Scalability Solutions: Scalability remains a significant challenge for 
blockchain networks, particularly as the size and complexity of DL 
models and datasets continue to increase. Research efforts should 
focus on developing scalable solutions for DL systems based on block- 
chain, including off-chain computation, layer-two scaling solutions 
(e.g., state channels and sidechains), and network optimization tech- 
niques. Moreover, advancements in interoperability protocols and 
cross-chain communication mechanisms can facilitate seamless data 
exchange and collaboration across diverse blockchain networks. 

6. Interdisciplinary Collaboration: Experts in blockchain technology, 
ML, cryptography, and domain-specific domain-specific areas should 
collaborate interdisciplinary on future research projects. Through 
fostering interdisciplinary collaborations, scholars can capitalize on a 
range of viewpoints and proficiencies to tackle intricate problems and 
stimulate novelty in DL applications grounded on blockchain technol- 
ogy. Research consortia, hackathons, and open-source communities 
are examples of collaborative projects that can promote information 
exchange and quicken scientific advancement. 

7. Legal and Ethical Issues: As blockchain-based DL systems develop 
further, legal and ethical issues pertaining to data security, privacy, 
and responsibility must be addressed. Subsequent investigations ought 
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to focus on formulating standards, guidelines, and regulatory frame- 
works to guarantee the conscientious implementation and utiliza- 
tion of blockchain technology in intelligence applications. Initiatives 
that support accountability, justice, and openness in algorithmic 
decision-making can also assist reduce risks and guarantee the moral 
application of blockchain-based DL systems. 


In conclusion, there is a great deal of potential for revolutionizing intelligence 
applications across industries in the future directions and developing trends 
in the integration of ML/DL with blockchain intelligence. Researchers and 
practitioners may fully realize the potential of blockchain-based DL sys- 
tems and propel significant improvements in automation, decision-making, 
and intelligence analysis by tackling important issues, welcoming innova- 
tion, and promoting multidisciplinary collaboration. 
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Efficient resource allocation in cloud-fog 
Internet of Things (loT) networks using 
metaheuristic scheduling algorithm 
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Ruqqaiya Begum, and Ganesh Reddy Karri 


Il. INTRODUCTION 


Cloud-Fog Computing (CFC) has revolutionized the landscape of distrib- 
uted computing by offering a flexible and scalable infrastructure for hosting 
diverse applications and services [1]. This paradigm leverages both central- 
ized cloud data centers and distributed fog nodes located closer to end-users 
to efficiently process and manage computing tasks. Task Scheduling (TS) 
plays a pivotal role in optimizing resource utilization, minimizing latency, 
and enhancing overall system performance in CFC environments. 

Traditional TS approaches often face challenges in dynamically allocat- 
ing resources to meet the varying demands of applications and users. The 
inherent complexity, heterogeneity, and dynamic nature of CFC environ- 
ments require innovative optimization techniques to achieve efficient TS [2]. 
In recent years, metaheuristic optimization algorithms have garnered sig- 
nificant attention due to their ability to efficiently search large solution 
spaces and adapt to changing environmental conditions. 

In this context, this research focuses on proposing a novel approach for 
TS in CFC environments utilizing the Harmony Search Mexican Axolotl 
Optimization (HSMAO) algorithm. Inspired by the harmonious behavior 
of Mexican axolotls in their natural habitat, HSMAO aims to strike a bal- 
ance between exploration and exploitation to effectively optimize TS in 
dynamic and heterogeneous computing environments [3]. 

The unique characteristics of HSMAO make it well-suited for addressing 
the challenges associated with TS in CFC [4]. By dynamically adapting to 
changing workload conditions and resource availability, HSMAO seeks to 
optimize resource allocation, minimize response time, and reduce energy 
consumption in CFC environments [5]. 

In this chapter, we present a comprehensive investigation of the pro- 
posed HSMAO algorithm for TS in CFC environments. We evaluate its 
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performance against state-of-the-art scheduling algorithms using various 
performance metrics and workload scenarios. The results obtained dem- 
onstrate the effectiveness and robustness of HSMAO in optimizing TS, 
thereby contributing to the advancement of resource allocation strategies in 
CFC. The contribution of this study is as follows: 


e A novel optimization approach HSMAO introduces a unique blend 
of harmony search (HS) and Mexican axolotl-inspired principles 
for CFC TS, addressing resource allocation challenges in dynamic 
environments. 

e HSMAO optimizes resource allocation, minimizing response time 
and energy consumption in CFC systems by dynamically adapting to 
workload changes. 

e This study evaluates HSMAO against state-of-the-art algorithms, 
showcasing its effectiveness and providing insights for designing effi- 
cient resource allocation in dynamic computing environments. 


The outline of chapter will begin with an introduction, providing an over- 
view of TS in CFC and introducing the HSMAO algorithm. Section 11.2, 
Related Work, will follow, reviewing existing research on TS in CFC envi- 
ronments. Section 11.3, Research Methods, will detail the design and 
implementation of HSMAO. Section 11.4, Results, will then be presented, 
showcasing the performance of HSMAO compared to existing algorithms. 
Finally, Section 11.5, Conclusion, will summarize the findings, discuss 
implications, and outline future research directions. 


11.2 RELATED WORK 


Table 11.1 literature survey conducted for TS in CFC using the HSMAO 
algorithm provides a comprehensive overview of existing research in this 
domain. The survey encompasses various studies that investigate different 
aspects of TS techniques, optimization algorithms, and their applications in 
CFC environments. By synthesizing findings from the reviewed literature, 
this survey aims to identify gaps, trends, and challenges in current research 
and pave the way for the introduction and evaluation of the HSMAO algo- 
rithm. Through a systematic examination of prior work, this survey lays 
the groundwork for understanding the state-of-the-art in CFC in TS and 
highlights the need for novel optimization approaches like HSMAO to 
address the evolving demands of dynamic and heterogeneous computing 
environments. 
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Table II.1 Analysis of scheduling parameters and techniques used in a hybrid 


cloud-fog scenario 


Parameters 

Ref no. Technique used addressed Limitations 

[6] Priority-based ant Makespan, cost, In this proposed technique will 
colony optimization deadline not give the accurate result for 
(PBACO) violation rate, multiple parameters while in 

and resource execution process 
utilization 

[7] Multi-component Execution cost Author unable to address the 
path-covering execution time, data transfer 
problem (MCPCPP) time for a huge data workflow 

in multi-clouds 

[8] Least total response Energy In this study author considered 
area (LTRA) consumption only limited number of 

and task parameters 
reduction time 

[9] Deep reinforcement Response time, Author not concentrated on 
learning (DRL) success rate, real-world problems 

and cost 

[10] Modified particle Monetary cost Meta heuristic optimization 
swarm optimization methods for multi-cloud 
or multi-objective environment is not considered 
particle swarm 
optimization (M-PSO) 

[11] Improved Jumping Makespan time, Author unable to address the 
Frog Algorithm (IJFA) execution multi-objective model lowering 

costs, and the integrated fog-cloud 
resource architecture’s response time 
utilization and energy consumption 

[12] Hybrid Genetic Cost and In this work, huge data center 
Algorithm (HGA) execution time space is not considered 

[13] Opposition-Based Energy In this study future research on 
Chaotic Whale consumption data privacy with the help of 
Optimization and QoS the blockchain-based FogBus 
Algorithm parameters platform was not done 
(OppoCWOA) 

[14] An enhanced version Energy It simply records one snapshot 
of the Hunter consumption of the system’s performance at 
algorithm (possibly in and job any given moment 
the context of Al or completion 
optimization, though rate 
the specific meaning 
might vary) 

(HunterPlus) 

[15] Hybrid Metaheuristic Task Unable to resolve the issue 

Algorithm (HMA) completion with intelligent manufacturing 
rate and lines’ task flow scheduling 
power 


consumption 


(Continued) 
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Table 11.1 (Continued) Analysis of scheduling parameters and techniques used in a 
hybrid cloud-fog scenario 


Parameters 
Ref no. Technique used addressed Limitations 
[16] Chaotic multi-agent Cost, makespan, Unable to implement the plan 
system or and schedule for deployment into real-world 
cooperative length systems. We see CEP as a 
multi-agent system viable use for the loT 
(CMAS) 


[17] Priority-based 
self-configuring 
swarm optimization 
(PSCSO) 

[18] Dynamic Voltage and 
Frequency Scaling 
(DVFS) 

[19] Two-stage flow control 
(TSFC) 


[20] Cat swarm 
optimization or crow 
search optimization 
(CSO) 


[21] Energy-Efficient 
Optimization 


Algorithm (EEOA) 


[22] EAEFA 


Makespan and 
energy 
consumption 


Energy 
consumption 


Total execution 
time and 
average waiting 
time 


Energy 
consumption 
and resource 
utilization 


Cost, makespan, 
and energy 
consumption 


Makespan time, 
response time, 
execution 
time, and 
energy usage 


Author not considered the 
multi-objective optimization. 
Proposed approach will not 
give the good amount of result 

Some crucial elements, such as 
privacy metrics and trust, are 
not covered 

Other problems including 
match difficulties, string 
mapping, and oblivious RAM 
are not addressed by this 
technique 

Author’s inability to address the 
issue of pre-emptive job 
scheduling will be investigated 
in light of the need for job 
relocation 

The proposed technique is best 
for only in terms of energy 
consumption. Other metrics 
are inadequate to address 

Proposed technique QoS was 
not met the up to the mark of 
scheduling requirements 


11.3 SYSTEM MODEL AND PROBLEM FORMULATION 


11.3.1 System model 


The system model for TS in CFC, employing the HSMAO scheduler, begins 
with the collection of IoT (Internet of Things) data from end devices. This 
data collection phase involves gathering information from various sen- 
sors, devices, or endpoints distributed across the network. Subsequently, 
the collected data undergoes a classification process based on the char- 
acteristics of the tasks it represents. Specifically, tasks are categorized 
into two main types: delay-sensitive tasks and non-delay-sensitive tasks. 
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This classification is crucial for determining the appropriate allocation of 
computational resources in the CFC environment. The classification of 
tasks in CFC based on their delay sensitivity can be represented using a 
mathematical equation that assigns a binary value indicating whether a 
task is delay-sensitive or not. Let DS; represent the delay sensitivity of task 
i, which takes the value 1 if the task is delay-sensitive and 0 otherwise. The 
classification equation can be expressed as below. The equation assigns a 
value of 1 to DS; if task i is determined to be delay-sensitive based on pre- 
defined criteria such as its deadline, sensitivity to latency, or application 
requirements. Conversely, if the task is not deemed to be delay-sensitive, 
DS, is assigned a value of 0. This binary classification enables the system 
to distinguish between tasks that require immediate processing and those 
that can tolerate longer processing times, guiding the TS decisions in CFC 
environments to optimize resource allocation and meet application perfor- 
mance objectives (Figure 11.1). 


1 if taskiis delay sensitive (11.1) 
0 task inon delay sensitive l 


Task 
Classification 
Task 
Del Prioritization 
ae Non-Delay- 
tasks Random sensitive tasks 
Workflow 
HSMAO 
Scheduler 


Figure 11.1 Harmony Search Mexican Axolotl Optimization architecture. 
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In the system model, delay-sensitive tasks are assigned to fog virtual 
machines (VMs) for execution. Fog nodes, located closer to the edge of 
the network, host these VMs to minimize latency and ensure timely pro- 
cessing of time-critical tasks. Fog computing offers proximity to end 
devices, enabling faster response times and reduced network congestion for 
delay-sensitive applications. However, non-delay-sensitive tasks are sched- 
uled on cloud VMs residing in centralized data centers. These tasks typically 
have less stringent latency requirements and can tolerate longer process- 
ing times. Cloud data centers provide ample computational resources and 
storage capacity, making them suitable for handling bulk processing and 
storage-intensive tasks. 

The below equations, total fog,,,,, and total cloud,,..,. represent the total 
number of tasks allocated to fog and cloud VMs, respectively. These expres- 
sions are computed by summing up the number of tasks assigned to each 
fog VM and cloud VM, respectively. For example, total fog,,,,, is calculated 
by adding the number of tasks assigned to each individual fog VM, denoted 
as fog,,, fog, and so on, up to fog,,. Similarly, total cloud,,,,, is calculated 
by summing up the number of tasks assigned to each cloud VM, denoted 
as cloud,,, cloud,,, and so on, up to cloud,,. These expressions provide a 
straightforward way to compute the total workload allocated to fog and 
cloud VMs, which can be useful for analyzing resource utilization, work- 
load distribution, and system performance in CFC environments. 


Total fog asks = fog. + fog, +-+ fog,, (11.2) 
Total cloud,,.; = cloud; + cloud, +---+cloud,, (11.3) 


The core component of the system model is the HSMAO scheduler, which 
orchestrates the allocation of tasks to fog and cloud VMs based on their char- 
acteristics and system constraints. HSMAO leverages a combination of HS 
and Mexican axolotl-inspired optimization principles to achieve efficient TS. 
By dynamically adapting to changing workload conditions, resource 
availability, and task priorities, HSMAO optimizes resource utilization, 
minimizes response times, and enhances overall system performance. 
The integration of HSMAO into the system model facilitates adaptive and 
intelligent TS in CFC environments, catering to the diverse needs of IoT 
applications and end-users. 


11.3.1.1 Random workflow 


In TS for CFC using the HSMAO algorithm, a random workflow usage 
strategy plays a crucial role in efficiently allocating computational resources 
and optimizing task execution. Workflows in CFC environments are often 
represented as Directed Acyclic Graphs (DAGs), where nodes represent 
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Figure 11.2 Random workflow. 


tasks, and edges denote dependencies between tasks. A random workflow 
usage strategy involves dynamically selecting workflow instances from a 
pool of available workflows based on factors such as task characteristics, 
resource availability, and system constraints. By randomly selecting work- 
flow instances, the task scheduler can adapt to changing workload condi- 
tions and effectively balance resource utilization across fog nodes and cloud 
data centers (Figure 11.2). 

Directed Acyclic Graphs (DAGs) provide a structured representation of 
task dependencies and execution order in CFC environments. Each node in 
the DAG represents a task, while edges between nodes signify dependen- 
cies between tasks. DAGs facilitate efficient TS by capturing the sequential 
and parallel execution requirements of workflows. Tasks with no incoming 
edges (i.e., no dependencies) can be executed concurrently, while tasks with 
dependencies must wait for their predecessors to complete before execution. 
By analyzing the DAG structure and dependencies, the task scheduler can 
devise an optimal scheduling strategy to minimize response time, maximize 
resource utilization, and meet task deadlines in CFC environments. 

The random workflow usage strategy in TS using HSMAO algorithm 
leverages the flexibility and adaptability of DAG representations to dynami- 
cally allocate tasks to fog nodes and cloud data centers. By randomly select- 
ing workflow instances based on DAG characteristics, the task scheduler 
can effectively balance the computational workload and optimize resource 
utilization. Additionally, the use of DAGs enables the task scheduler to 
exploit parallelism and concurrency in task execution, thereby improving 
overall system performance and efficiency in CFC environments. 
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11.3.2 Problem formulation 


The TS problem in CFC aims to efficiently allocate computational resources 
to execute a set of tasks while minimizing response time, maximizing 
resource utilization, and meeting task deadlines. Formally, let T represent 
the set of tasks to be scheduled, where each task t; is characterized by its 
computational requirements C, deadline D, priority P, and sensitivity to 
delay. Additionally, let F denote the set of fog nodes and C represent the 
set of cloud data centers available in the system. The objective is to assign 
each task to a suitable computing resource (either fog node or cloud data 
center) and determine the optimal scheduling strategy to minimize overall 
response time and maximize resource utilization, while ensuring that all 
task deadlines are met. 

The time complexity of the HSMAO algorithm mainly depends on the 
number of iterations required to converge to an optimal solution. Typically, 
the time complexity of HSMAO can be expressed as O(N-M), where N 
represents the population size and M denotes the maximum number of 
iterations. Additionally, the complexity of individual operations within 
HSMAO, such as evaluating the fitness function and updating solution vec- 
tors, may vary depending on the specific implementation details. 

In the hybrid approach combining HS and Mexican Axolotl Optimization 
(MAO), the time complexity is influenced by the iterations performed by 
each algorithm and the interactions between them. Assuming Nys and 
Nmao as the population sizes for HS and MAO, respectively, and Mys and 
Mmao as the maximum number of iterations for each algorithm, the over- 
all time complexity can be expressed as shown below. Additionally, the 
computational overhead incurred by the interaction between HS and MAO 
should also be considered in the time complexity analysis. 


O((Nus + Nuao):(Mus + Mmao)) (11.4) 


11.3.2.1 Objective function 


The objective function for TS in CFC using the HSMAO algorithm aims 
to minimize overall response time while maximizing resource utilization 
and meeting task deadlines. Mathematically, the objective function f can 


be defined as follows: 
n D; 
f$ (m Rm tis Bem (1-2)] (11.5) 
where 


e nis the total number of tasks. 
e R; represents the response time of task i, which is the time taken for 
the task to be completed. 
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e U, denotes the resource utilization of the computing resource allo- 
cated to task i. 

E; is the consumed energy of task i. 

D; is the deadline of task i. 

C; represents the computational requirement of task i. 

Wi, W2, W3, and w, are weighting factors that determine the relative 
importance of response time, resource utilization, and meeting dead- 
lines, respectively. 


The objective function aims to strike a balance between minimizing 
response time, resource utilization, and energy consumption, and ensuring 
timely completion of tasks. The weighting factors w4, W2, W3, and w, can be 
adjusted based on the specific requirements and priorities of the CFC envi- 
ronment. Optimizing this objective function using the HSMAO algorithm 
will result in an efficient TS strategy that enhances system performance and 
meets the desired quality of service objectives. 


1.3.2.2 Resource utilization 


Resource utilization, U;, is mathematically defined as the ratio of the com- 
putational requirement of task ii, denoted by C, to the total available com- 
putational resource of the allocated computing resource, represented as 


C 


total* 


C; 
Crotal 


U; = (11.6) 


The above equation quantifies the proportion of the allocated computa- 
tional resource utilized by task ii. This ratio serves as a measure of how 
efficiently the computing resource is being utilized for task execution. A 
resource utilization value close to 1 indicates optimal utilization, where 
the task fully consumes the allocated resource, while a value closer to 0 
suggests underutilization. Efficient resource utilization is crucial for maxi- 
mizing system performance and ensuring effective allocation of computing 
resources in CFC environments. 


[1.3.2.3 Energy consumption 


The mathematical equation for energy consumption, E;, in CFC can be 
expressed as the product of the power consumption rate of the computing 
resource allocated to task ii and the execution time of the task. 


E =PXT, (11.7) 


where P, represents the power consumption rate and T; denotes the exe- 
cution time of task żi. This equation quantifies the energy consumed by 
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task ii during its execution. By considering both the power consumption 
rate and the execution time, the equation provides a comprehensive mea- 
sure of energy consumption for individual tasks in CFC environments. 
Reducing energy consumption is critical for improving the sustainability 
and cost-effectiveness of CFC systems, making this equation valuable for 
optimizing energy efficiency in TS decisions. 


11.3.2.4 Response time 


The mathematical equation for response time, R;, in CFC is defined as the 
total time taken for task ii to complete its execution, including both pro- 
cessing time and waiting time. 


R; = W; + T; (11.8) 


where T; represents the processing time of task ii and W, denotes the waiting 
time, which is the time spent in queues or waiting for resources to become 
available. This equation provides a comprehensive measure of the time 
taken for a task to be fully executed, taking into account both computa- 
tional processing and queuing delays. Minimizing response time is essential 
for improving system performance and meeting the responsiveness require- 
ments of applications and end-users in CFC environments. Therefore, opti- 
mizing this equation is a key objective in TS decisions to enhance overall 
system efficiency and user experience. 


[1.3.2.5 Proposed algorithm 


The below algorithm presents the pseudo-code for the proposed HSMAO 
algorithm. 


Input: task characteristics, energy consumption, resource utilization, 
and response time. 


Initialize population of harmonies and axolotls. 

Evaluate fitness of each harmony and axolotl based on objective 
function. 

Repeat until convergence: 


Perform HS phase: 
Update harmonies using pitch adjustment and memory consideration. 
Evaluate fitness of updated harmonies. 

Perform MAO phase: 
Update axolotls using cooperation and adaptation principles. 
Evaluate fitness of updated axolotls. 

Combine harmonies and axolotls. 
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Evaluate fitness of combined solutions. 
Select the best solutions for next iteration. 


Determine allocation of tasks to fog and cloud VMs based on solution 
obtained. 

Execute allocated tasks on respective VMs. 

Repeat scheduling process as needed to adapt to changing workload 
conditions. 


Output: the Best solution obtained. 


11.4 RESULTS AND DISCUSSION 


11.4.1 Results 


The results obtained from the CloudSim toolkit simulations reveal the per- 
formance metrics of VMs deployed in both cloud and fog computing envi- 
ronments. For cloud VMs, the total number of VMs varies from 15 to 25, 
with computing power ranging from 2,000 to 4,000 MIPS. RAM capaci- 
ties for cloud VMs span from 5,000 to 20,000 MB, while bandwidth varies 
from 512 to 4,096 Mbps. Conversely, for fog VMs, the total VM count 
ranges from 10 to 20, with computing power and RAM capacity similar 
to cloud VMs, i.e., 2,000-4,000 MIPS and 250-5,000 MB, respectively. 
However, fog VMs exhibit lower bandwidth, ranging from 128 to 1,024 
Mbps, reflecting the constraints of edge computing resources. 

In our proposed approach, we compare the results obtained using the 
HSMAO algorithm with existing optimization algorithms including MAO, 
Harmony Search Optimization (HSO), and Ant Colony Optimization (ACO). 
Through comprehensive evaluations, we analyze various performance met- 
rics such as response time, resource utilization, and meeting task deadlines. 
Our results demonstrate that HSMAO outperforms existing algorithms by 
efficiently allocating tasks based on their characteristics and system con- 
straints. HSMAO effectively balances workload distribution between cloud 
and fog resources, optimizing resource utilization while ensuring timely task 
completion and meeting quality of service requirements. 


11.4.1.1 Resource utilization 


In terms of resource utilization, our proposed approach utilizing the 
HSMAO algorithm has yielded notably superior results when compared to 
existing optimization algorithms such as MAO, HSO, and ACO. Through 
meticulous evaluations, HSMAO has demonstrated its efficacy in opti- 
mizing resource allocation by efficiently distributing tasks across cloud 
and fog computing environments. By dynamically adapting to changing 
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Figure 11.3 Calculation of resource utilization. 


workload conditions and system constraints, HSMAO effectively max- 
imizes resource utilization while minimizing response time and ensur- 
ing timely task completion. These results underscore the effectiveness of 
HSMAO in addressing the challenges of TS in CFC environments, making 
it a promising approach for enhancing system performance and scalability 
(Figure 11.3). 


11.4.1.2 Energy consumption 


Our proposed approach utilizing the HSMAO algorithm has demon- 
strated superior performance compared to existing optimization algo- 
rithms such as MAO, HSO, and ACO. Through rigorous evaluations, 
HSMAO has effectively optimized TS decisions to minimize energy con- 
sumption while meeting performance objectives in CFC environments. 
By dynamically allocating tasks based on workload characteristics and 
system constraints, HSMAO optimizes resource utilization, reduces idle 
time, and minimizes unnecessary energy consumption. These findings 
highlight the effectiveness of HSMAO in achieving energy-efficient TS, 
contributing to the sustainability and cost-effectiveness of CFC systems 
(Figure 11.4). 
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11.4.1.3 Response time 


HSMAO algorithm has demonstrated significant improvements compared 
to existing optimization algorithms such as MAO, HSO, and ACO. Through 
comprehensive evaluations, HSMAO has effectively minimized response 
times by dynamically allocating tasks based on their characteristics and 
system constraints in CFC environments. By balancing workload distribu- 
tion between cloud and fog resources and adapting to changing workload 
conditions, HSMAO optimizes resource utilization and reduces queuing 
delays, resulting in faster task completion and enhanced system respon- 
siveness. These results highlight the superior performance of HSMAO in 
achieving low response times and improving overall system efficiency in 
CFC environments (Figure 11.5). 


11.4.2 Discussions 


In the context of resource utilization, response time, and energy consump- 
tion, our proposed approach employing the HSMAO algorithm has shown 
remarkable efficacy compared to existing optimization algorithms such as 
MAO, HSO, and ACO. First, in terms of resource utilization, HSMAO 
optimizes the allocation of tasks across cloud and fog computing resources, 
achieving higher levels of resource utilization compared to other algorithms. 
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Figure 11.5 Calculation of response time. 


By dynamically adapting to changing workload conditions and system con- 
straints, HSMAO ensures efficient utilization of computational resources, 
minimizing idle time and maximizing throughput. 

Regarding response time, HSMAO significantly reduces task response 
times by intelligently scheduling tasks based on their characteristics and 
system dynamics. Through its adaptive nature and robust optimization 
techniques, HSMAO minimizes queuing delays and optimizes task execu- 
tion, resulting in faster completion times and improved system respon- 
siveness. This is particularly beneficial for time-sensitive applications and 
services, where reduced response times are crucial for meeting performance 
requirements and enhancing user experience. 

Moreover, in terms of energy consumption, HSMAO achieves notable 
reductions in energy usage compared to existing algorithms. By optimizing 
TS decisions to minimize energy consumption while meeting performance 
objectives, HSMAO effectively reduces the overall energy footprint of CFC 
systems. Through its dynamic allocation of tasks and efficient resource 
utilization strategies, HS MAO minimizes unnecessary energy expenditure 
and promotes sustainability in CFC environments. 


11.5 CONCLUSION AND FUTURE WORK 


In conclusion, our study has proposed a novel approach for TS in CFC 
environments, leveraging the HSMAO algorithm. Through comprehensive 
evaluations and comparisons with existing optimization algorithms such as 
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MAO, HSO, and ACO, we have demonstrated the effectiveness of HSM AO 
in optimizing resource utilization, response time, and energy consumption. 
HSMAO dynamically allocates tasks based on their characteristics and 
system constraints, achieving higher levels of resource utilization, lower 
response times, and reduced energy consumption compared to other algo- 
rithms. The results underscore the significance of HSMAO in enhancing 
the efficiency, performance, and sustainability of CFC systems. Moving 
forward, further research can explore additional optimization objectives, 
scalability considerations, and real-world deployments to fully harness the 
potential of HSMAO in addressing the evolving challenges of TS in CFC 
environments. 
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